Using REGEX and ORTH as part of phrasematching in SpaCy

Question

I am using spacy's phrasematching for a large number of rules, some of these are company stock symbols which appear in the form of lets say:

AAPL.NY BB.TX

Or these could appear as AAPL or BB.

When phrasematching I have been using two patterns to get these matches:

{"label": "TICKER", "pattern": [{"ORTH": {"REGEX": "AAPL\\.[A-Z]{2,3}"}}]}
{"label": "TICKER", "pattern": [{"ORTH": "AAPL"}]}

Is ORTH the right pattern to match for the REGEX? It gives some interesting results sometimes where it will capture something like AAPL.HSHSHSJSKKSKKS even though that is beyond the {2,3}.

Could anyone help me with a) Whether using ORTH makes sense here b) How would one limit the use of REGEX to only have a max of 2 or 3 characters after the period ?

Does my answer help or do you need more assistance?
– Wiktor Stribiżew
Commented Jul 1, 2021 at 9:50 — Wiktor Stribiżew, Commented Jul 1, 2021 at 9:50

Wiktor Stribiżew · Accepted Answer · 2021-06-30 11:33:41Z

2

ORTH (meaning orthography) was used before TEXT was introduced in Spacy 2.1. Now, when doing regex matching, you'd better apply that to TEXT.

As for the regex itself, mind that it is applied to the whole token text, and in order to match the entire token text, you need to use anchors, ^ and $ (or \A and \z).

So, you can use

{"TEXT": {"REGEX": r"^AAPL\.[A-Z]{2,3}$"}}

Also, note the use of a raw string literal so as to avoid double escaping backslashes.

answered Jun 30, 2021 at 11:33

Wiktor Stribiżew

621k39 gold badges480 silver badges596 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Using REGEX and ORTH as part of phrasematching in SpaCy

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
python
spacy
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Not the answer you're looking for? Browse other questions tagged pythonspacy or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python
spacy
or ask your own question.