The following link shows how to add custom entity rule where the entities span more than one token. The code to do that is below:
import spacy
from spacy.pipeline import EntityRuler
nlp = spacy.load('en_core_web_sm', parse=True, tag=True, entity=True)
animal = ["cat", "dog", "artic fox"]
ruler = EntityRuler(nlp)
for a in animal:
ruler.add_patterns([{"label": "animal", "pattern": a}])
nlp.add_pipe(ruler)
doc = nlp("There is no cat in the house and no artic fox in the basement")
with doc.retokenize() as retokenizer:
for ent in doc.ents:
retokenizer.merge(doc[ent.start:ent.end])
I tried to add another custom entity ruler as follows:
flower = ["rose", "tulip", "african daisy"]
ruler = EntityRuler(nlp)
for f in flower:
ruler.add_patterns([{"label": "flower", "pattern": f}])
nlp.add_pipe(ruler)
but I got this error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-47-702f460a866f> in <module>()
4 for f in flower:
5 ruler.add_patterns([{"label": "flower", "pattern": f}])
----> 6 nlp.add_pipe(ruler)
7
~\AppData\Local\Continuum\anaconda3\lib\site-packages\spacy\language.py in add_pipe(self, component, name, before, after, first, last)
296 name = repr(component)
297 if name in self.pipe_names:
--> 298 raise ValueError(Errors.E007.format(name=name, opts=self.pipe_names))
299 if sum([bool(before), bool(after), bool(first), bool(last)]) >= 2:
300 raise ValueError(Errors.E006)
ValueError: [E007] 'entity_ruler' already exists in pipeline. Existing names: ['tagger', 'parser', 'ner', 'entity_ruler']
My questions are:
How can I add another custom entity ruler?
Is it a best practice to use capital letters for the label (for example, instead of
ruler.add_patterns([{"label": "animal", "pattern": a}])
one should useruler.add_patterns([{"label": "ANIMAL", "pattern": a}])
instead?