3

I am trying to get the entity ruler patterns to use a combination of lemma & ent_type to generate a tag for the phrase "landed (or land) in Baltimore(location)". It seems to be working with the Matcher, but not the entity ruler I created. I set the override ents to True, so not really sure why this isn't working. It is most likely a user error, I am just not sure what it is. Below is the code example. From the output, you can see that the pattern rule was added after NER and I have set the override ents to true. Any input or suggestions would be appreciated!

The matcher tags the entire phrase (landed in Baltimore), but the entity rule does not.

Code Sample

import spacy
from spacy.matcher import Matcher

nlp = spacy.load('en_core_web_lg')

matcher = Matcher(nlp.vocab)

pattern = [{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]
patterns = [{"label":"FLYING","pattern":[{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]}]

matcher.add("Flying", [pattern])

rulerActions= EntityRuler(nlp, overwrite_ents=True)
rulerActions = nlp.add_pipe("entity_ruler","ruleActions").add_patterns(patterns)
# rulerActions.add_patterns(patterns)

print(f'spaCy Pipelines: {nlp.pipe_names}')

doc = nlp("The student landed in Baltimore for the holidays.")

matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  # Get string representation
    span = doc[start:end]  # The matched span
    print(f'{string_id}  ->  {span.text}')
    
for ent in doc.ents:
    print(ent.text, ent.label_)

Print Statements

spaCy Pipelines: ['tok2vec', 'tagger', 'parser', 'attribute_ruler', 'lemmatizer', 'ner', 'ruleActions']
Flying  ->  landed in Baltimore
Baltimore GPE
the holidays DATE

1 Answer 1

5

Here is a working version of your code:

import spacy

nlp = spacy.load('en_core_web_lg')

patterns = [{"label":"FLYING","pattern":[{"LEMMA":"land"},{}, {"ENT_TYPE":"GPE"}]}]

ruler = nlp.add_pipe("entity_ruler","ruleActions", config={"overwrite_ents": True})
ruler.add_patterns(patterns)

print(f'spaCy Pipelines: {nlp.pipe_names}')

doc = nlp("The student landed in Baltimore for the holidays.")

matches = matcher(doc)
for match_id, start, end in matches:
    string_id = nlp.vocab.strings[match_id]  # Get string representation
    span = doc[start:end]  # The matched span
    print(f'{string_id}  ->  {span.text}')
    
for ent in doc.ents:
    print(ent.text, ent.label_)

The Matcher you are creating isn't used at all. When you call EntityRuler that creates an EntityRuler, but calling add_pipe creates a completely different object, and it doesn't have the overwrite_ents config.

2
  • Oh duh. Thanks for the help @polm23! I did not see that. Thanks!
    – scarpacci
    Commented Dec 23, 2021 at 15:31
  • I came here following the demo on spacy.io; although their in-browser code worked, I had to run your code instead. spacy.io/usage/rule-based-matching#entityruler
    – Chris
    Commented Jul 8, 2022 at 18:19

Not the answer you're looking for? Browse other questions tagged or ask your own question.