I'm trying to identify the entities by passing the Regular expression (Regex) to the Spacy model using Entity Ruler but, Spacy is unable to identify based on the below regex.
But, I tested the regex here and it's working.
import model_training
import spacy
nlp = spacy.load('en_core_web_trf')
nlp.add_pipe("spacytextblob")
nlp = model_training.train_model_with_regex(nlp)
model_training.py
def train_model_with_regex(nlp):
ruler = nlp.add_pipe("entity_ruler", before="ner")
patterns = [
{
"label": "VOLUME",
"pattern": [{"LOWER": {'REGEX': "(?:\d+\s(?:million|hundred|thousand|billion)*\s*)+"}}]
}
]
ruler.add_patterns(patterns)
return nlp
I wanted to achieve this, for the below example
text = "I have spent 5 million to buy house and 70 thousand for the furniture"
expected output:
{'result': [
{'label': 'VOLUME', 'text': '5 million'},
{'label': 'VOLUME', 'text': '70 thousand'}
]}
REGEX
is applied to each token separately.{"label": "VOLUME", "pattern": [{"LOWER": {'REGEX': r"(?:\d+\s(?:million)*\s*)+"}}]}
but still didn't workr"..."
for the regex.REGEX
s for several tokens.