How to filter some words in a queryset
I have a variable, which contains stock symbols. I need to split each symbol, to compute it independently.
print(Symbols_Splitted)
#returns this
["'['AAPL", 'TSLA', "MSFT']'"]
I need something to filter the relevant words, the pattern is always the same.
I tried this, which works but I find out an issue. Some symbols have special characters in them like "EURUSD=X", and this code remove the "=" which makes it not valid.
def convertor(s):
perfect = re.sub('[^a-zA-Z]+', '', s)
return perfect
all = list(map(convertor, Symbols_Splitted))
So, by taking the first example I need something like this:
Some_function(Symbols_Splitted)
Symbols_Splitted[0]
> AAPL
Symbols_Splitted[1]
> MSFT
Symbols_Splitted[2]
> TSLA
I don't think substitution is the optimal route to go here. I would try an define the pattern you are interested in instead -- the ticker symbol.
I am not entirely sure what all the valid characters in a ticker symbol are and what rules apply to those symbols. But judging from what I have read so far, it seems that the following holds:
- At least 2 characters long
- Must start and end with a latin letter or digit
- Can contain letters, digits, dots and equals signs
With those rules, we can construct the following simple pattern:
\w[\w=.]*\w
The Python code could look like this:
import re
PATTERN_TICKER_SYMBOL = re.compile(r"\w[\w=.]*\w")
def extract_symbol(string: str) -> str:
m = re.search(PATTERN_TICKER_SYMBOL, string)
if m is None:
raise ValueError(f"Cannot find ticker symbol in {string}")
return m.group()
test_data = [
"'['AAPL",
"TSLA",
"MSFT']'",
"''''...BRK.A",
"[][]EURUSD=X-...",
]
cleaned_data = [extract_symbol(s) for s in test_data]
print(cleaned_data)
Output:
['AAPL', 'TSLA', 'MSFT', 'BRK.A', 'EURUSD=X']
With additional requirements, the pattern can be extended of course.