0

I have this regex, it finds only the meta html tag that contain at least 3 of this words.

<meta name="description" content=.*(( the | that | of ).*){3,}.*>

The problem:

I have this 2 similar lines. Both have the same words, except the second line, where the is in a different place. So why does my regex finds only the second line, and not also the first line? How can I change the regex so as to find both lines?

<meta name="description" content="the mystery of the art that seeks its meaning.">

<meta name="description" content="the mystery of art that seeks the its meaning.">

1

3 Answers 3

1

For such search, you have to use positive lookahead:

  • Ctrl+F
  • Find what: <meta name="description" content="(?=[^">]*?\bthe\b)(?=[^">]*?\bthat\b)(?=[^">]*?\bof\b )[^">]*">
  • CHECK Wrap around
  • CHECK Regular expression
  • Find All in Current Document

Explanation:

<meta name="description" content="      # literally
(?=                                     # positive lookahead, make sure we have after:
    [^">]*?                                 # 0 or more any character that is not " or >
    \b                                      # word boundary
    the                                     # the word the
    \b                                      # word boundary
)                                       # end lookahead
(?=[^">]*?\bthat\b)                     # same for the word that
(?=[^">]*?\bof\b )                      # same for the word of
[^">]*                                  # 0 or more any character that is not " or >
">                                      # literally

Screenshot:

enter image description here

0
0

Your first line has the words of and the together and you are searching for the three words with spaces before and after. Try inserting another word between of and the

Instead of putting actual spaces on your regex like ... WORD ... try with word boundaries like ...\bWORD\b...

0

Yes, I find another solution:

<meta name="description" content=.*(\b(the|that|of)\b.*){3,}.*>

5
  • 1
    Be aware that matches <meta name="description" content="the mystery, the art and the mystery again."> You see, 3 times the but no that nor of
    – Toto
    Commented Jul 21, 2021 at 15:58
  • That's exactly the solution I proposed in my answer
    – golimar
    Commented Jul 22, 2021 at 9:16
  • @golimar I believe you didn't give any concret solution, but just an indication, a general idea of how can be done. Of course, was a start. Thanks. But next time, please write a concrete solution with an example for the case. Right now, if I copy what you had write in the answer, I cannot resolve my problem. You should write that in a comment, not in a answer.
    – Just Me
    Commented Jul 22, 2021 at 14:23
  • @JustMe It is an answer, only that it is not code, but explained with words, and you did exactly what I explained and it worked. That's why I wrote it in an answer and not in a comment
    – golimar
    Commented Jul 22, 2021 at 14:30
  • No, I did not. :) Your \bWORD\b it can be formulated in a million ways, search google. That is a usual match case. I can write also (\bthe\b|\bthat\b|\bof\b) and other one milion examples with your \bWORD\b. And none would work. Anyway, you will see yourself if you will get votes for your answer...
    – Just Me
    Commented Jul 22, 2021 at 14:55

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .