I am trying to get the entity ruler patterns to use a combination of lemma & ent_type to generate a tag for the phrase "landed (or land) in Baltimore(location)". It seems to be working with the Matcher, but not the entity ruler I created. I set the override ents to True, so not really sure why this isn't working. It is most likely a user error, I am just not sure what it is. Below is the code example. From the output, you can see that the pattern rule was added after NER and I have set the override ents to true. Any input or suggestions would be appreciated!
The matcher tags the entire phrase (landed in Baltimore), but the entity rule does not.
Code Sample
import spacy
from spacy.matcher import Matcher
nlp = spacy.load('en_core_web_lg')
matcher = Matcher(nlp.vocab)
pattern = [{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]
patterns = [{"label":"FLYING","pattern":[{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]}]
matcher.add("Flying", [pattern])
rulerActions= EntityRuler(nlp, overwrite_ents=True)
rulerActions = nlp.add_pipe("entity_ruler","ruleActions").add_patterns(patterns)
# rulerActions.add_patterns(patterns)
print(f'spaCy Pipelines: {nlp.pipe_names}')
doc = nlp("The student landed in Baltimore for the holidays.")
matches = matcher(doc)
for match_id, start, end in matches:
string_id = nlp.vocab.strings[match_id] # Get string representation
span = doc[start:end] # The matched span
print(f'{string_id} -> {span.text}')
for ent in doc.ents:
print(ent.text, ent.label_)
Print Statements
spaCy Pipelines: ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner', 'ruleActions']
Flying -> landed in Baltimore
Baltimore GPE
the holidays DATE