1

The following link shows how to add multiple EntityRuler with spaCy. The code to do that is below:

import spacy
import pandas as pd

from spacy.pipeline import EntityRuler
nlp = spacy.load('en_core_web_sm', disable = ['ner'])
ruler = nlp.add_pipe("entity_ruler")


flowers = ["rose", "tulip", "african daisy"]
for f in flowers:
    ruler.add_patterns([{"label": "flower", "pattern": f}])
animals = ["cat", "dog", "artic fox"]
for a in animals:
    ruler.add_patterns([{"label": "animal", "pattern": a}])



result={}
doc = nlp("cat and artic fox, plant african daisy")
for ent in doc.ents:
        result[ent.label_]=ent.text
df = pd.DataFrame([result])
print(df)

The output:

      animal         flower
0  artic fox  african daisy

The problem is: How can i pass dataframe or table instead of the text:"cat and artic fox, plant african daisy"

1
  • You can't. You need to write a function to get the text out of the dataframe.
    – polm23
    Commented Jun 15, 2021 at 9:35

1 Answer 1

3

Imagine that your dataframe is

df = pd.DataFrame({'Text':["cat and artic fox, plant african daisy"]})

You may define a custom method to extract the entities and then use it with Series.apply:

def get_entities(x):
    result = {}
    doc = nlp(x)
    for ent in doc.ents:
        result[ent.label_]=ent.text
    return result

and then

df['Matches'] = df['Text'].apply(get_entities)
>>> df['Matches']
0    {'animal': 'artic fox', 'flower': 'african daisy'}
Name: Matches, dtype: object

Not the answer you're looking for? Browse other questions tagged or ask your own question.