How to filter some words in a queryset

I have a variable, which contains stock symbols. I need to split each symbol, to compute it independently.

print(Symbols_Splitted)

 #returns this 

["'['AAPL", 'TSLA', "MSFT']'"]

I need something to filter the relevant words, the pattern is always the same.

I tried this, which works but I find out an issue. Some symbols have special characters in them like "EURUSD=X", and this code remove the "=" which makes it not valid.

            def convertor(s):
                perfect = re.sub('[^a-zA-Z]+', '', s)
                return perfect

            all = list(map(convertor, Symbols_Splitted))

So, by taking the first example I need something like this:

Some_function(Symbols_Splitted)

Symbols_Splitted[0]
> AAPL
Symbols_Splitted[1]
> MSFT
Symbols_Splitted[2]
> TSLA

I don't think substitution is the optimal route to go here. I would try an define the pattern you are interested in instead -- the ticker symbol.

I am not entirely sure what all the valid characters in a ticker symbol are and what rules apply to those symbols. But judging from what I have read so far, it seems that the following holds:

  • At least 2 characters long
  • Must start and end with a latin letter or digit
  • Can contain letters, digits, dots and equals signs

With those rules, we can construct the following simple pattern:

\w[\w=.]*\w

The Python code could look like this:

import re


PATTERN_TICKER_SYMBOL = re.compile(r"\w[\w=.]*\w")


def extract_symbol(string: str) -> str:
    m = re.search(PATTERN_TICKER_SYMBOL, string)
    if m is None:
        raise ValueError(f"Cannot find ticker symbol in {string}")
    return m.group()


test_data = [
    "'['AAPL",
    "TSLA",
    "MSFT']'",
    "''''...BRK.A",
    "[][]EURUSD=X-...",
]
cleaned_data = [extract_symbol(s) for s in test_data]
print(cleaned_data)

Output:

['AAPL', 'TSLA', 'MSFT', 'BRK.A', 'EURUSD=X']

With additional requirements, the pattern can be extended of course.

Back to Top