I have this tags:

<span class="text_obisnuit2">* Not&#259;:</span>John Wells - <em>My Dreams</em>, Albatros Books, 1986.</p>

and this one:

<span class="text_obisnuit1">* Not&#259;:</span>Mariah Carey - <em>Lovers on the road</em>, BackStreet Books, 1965.</p>

So, I want to find those particular html tags <span class="text_obisnuit2"> that contain these words (strings): Albatros and <em> and </em> (The first line)

  • 7
    Obligatory stackoverflow post
    – mrbolichi
    Commented Aug 19, 2020 at 17:56
  • 1
    At first I thought you wanted to do this in production, but then I noticed the notepad++ tag. Commented Aug 19, 2020 at 20:07

2 Answers 2


This is a straightforward one, but it requires the 'Albatros' to come after the <em> tag (demo):

(<span class="text_obisnuit2">).*<em>.*<\/em>.*Albatros.*

The following one doesn't care in which order they are (demo):

(<span class="text_obisnuit2">).*(<em>.*<\/em>.*Albatros.*|Albatros.*<em>.*<\/em>.*)

Here is another variation, where the digit(s) after text_obisnuit don't matter and the entire span tag is captured as the first group (demo):

(<span class="text_obisnuit\d+">.*<\/span>).*(<em>.*<\/em>.*Albatros.*|Albatros.*<em>.*<\/em>.*)

All regexes assume the entries are each on their own line in the file. Perhaps it makes more sense to detect for <p> and </p> as the boundaries, but for that we would need to have a larger excerpt from your input file.

  • Ctrl+F
  • Find what: <span class="text_obisnuit2">(?=.*?<em>.*?</em>)(?=.*?\bAlbatros\b).*$
  • CHECK Wrap around
  • CHECK Regular expression
  • UNCHECK . matches newline
  • Find All in Current Document


<span class="text_obisnuit2">   # literally
(?=                             # positive lookahead, make sure we have after:
.*?                           # 0 or more any character but newline, not greedy
<em>                          # literally open em tag
.*?                           # 0 or more any character but newline, not greedy
</em>                         # literally close em tag
)                               # end lookahead
(?=                             # positive lookahead, make sure we have after:
.*?                           # 0 or more any character but newline, not greedy
\bAlbatros\b                  # Albatros with word boundaries
)                               # end lookahead
.*                              # 0 or more any character but newline
$                               # end of line


enter image description here

  • thanks Toto. Also, can you make the same for </em>, so to find those particular html tags that contain also </em> not just <em>
    – Just Me
    Commented Aug 19, 2020 at 12:08
  • 1
    @JustMe: Change (?=.*?<em>) to (?=.*?<em>.*?</em>), see my edit.
    – Toto
    Commented Aug 19, 2020 at 12:58

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .