2

I have tried a lot of things but still unable to figure out due to greedy nature of regular expression

abc = 'dfbafbd<a href="#Free_Calling_Best_Apps">Free Calling Best Apps</a>sbrwsggsfzbs<a></a>abc

My regular expression abc1 = re.sub(r'<a.+\/a>',' ',abc)

output = 'dfbafbd abc'

required output = 'dfbafbd sbrwsggsfzbs abc'

1 Answer 1

1

Make your regex not greedy:

abc1 = re.sub(r'<a.+?/a>',' ',abc)
#            here __^

But Parsing HTML with regex is a hard job.

HTML and regex are not good friends. Use a parser, it is simpler, faster and much more maintainable.

1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .