I am using spacy's phrasematching for a large number of rules, some of these are company stock symbols which appear in the form of lets say:
AAPL.NY BB.TX
Or these could appear as AAPL or BB.
When phrasematching I have been using two patterns to get these matches:
{"label": "TICKER", "pattern": [{"ORTH": {"REGEX": "AAPL\\.[A-Z]{2,3}"}}]}
{"label": "TICKER", "pattern": [{"ORTH": "AAPL"}]}
Is ORTH the right pattern to match for the REGEX? It gives some interesting results sometimes where it will capture something like AAPL.HSHSHSJSKKSKKS even though that is beyond the {2,3}.
Could anyone help me with a) Whether using ORTH makes sense here b) How would one limit the use of REGEX to only have a max of 2 or 3 characters after the period ?