2

In the example below, I want to use regex as to find the html tag <sony> between other 2 lines.

         <table width="697" border="0">
      <tr>
        <td><h1 class="den_articol" itemprop="name">Your Mirror</h1></td>
      </tr>
        </table>
<sony>
<p class="text_obisnuit"><span class="text_obisnuit2">* Not&#259;:</span> <a href="https://www.youtube.com/watch?v=IB4P5t3JGlg" target="_new">Simply Red - Your Mirror</a></p>

The OUTPUT:

<sony>

My regex formula doesn't work:

</table>\s+\n(.*?)<p class="text_obisnuit">

4
  • Just remove \s+ from your regex as it seems there're no spaces after </table>.
    – Toto
    Commented Jul 2, 2023 at 15:35
  • yes, but in this case will select all those </table> <sony> <p class="text_obisnuit"> . And I want only <sony> Commented Jul 2, 2023 at 15:41
  • Yes, it will. But what are you trying to do? Do you want to remove everything except <sony>? Find: \A[\s\S]+</table>\s+(\S+)\s+<p class="text_obisnuit">[\s\S]+\z and replace with: $1
    – Toto
    Commented Jul 2, 2023 at 16:12
  • please put your solution in an ANSWER ! Commented Jul 2, 2023 at 18:43

4 Answers 4

2

I can give a solution for the case where there are not a variable number of spaces following <table>, because I use look-behind that does not support this in order to test without capturing.

My regex :

(?<=<\/table>\n)(.*?)\n(?=<p class="text_obisnuit">)

You may see it in action on regex101.com.

Don't forget in Notepad++ to check the option of ". matches newline".

Some explanation :

Lookaround Name What it Does
(?=foo) Lookahead Asserts that what immediately FOLLOWS the current position in the string is foo
(?<=foo) Lookbehind Asserts that what immediately PRECEDES the current position in the string is foo

enter image description here

2
  • Ctrl+H
  • Find what: \A[\s\S]+</table>\s+(\S+)\s+<p class="text_obisnuit">[\s\S]+\z
  • Replace with: $1
  • TICK Wrap around
  • SELECT Regular expression
  • Replace all

Explanation:

\A                          # beginning of file
[\s\S]+                     # 1 or more any character
</table>                    # literally
\s+                         # 1 or more any spaces
(\S+)                       # group 1, 1 or more any character that is not a space
\s+                         # 1 or more any spaces
<p class="text_obisnuit">   # literally
[\s\S]+                     # 1 or more any character
\z                          # end of file

Screenshot (before):

enter image description here

Screenshot (after):

enter image description here

0

Another solution:

FIND: (?s)(?-i:<!-- ARTICOL START -->|(?!\A)\G).*?\K(?<=</table>\s)(.*?)(?=\s<p class="text_obisnuit">)(?!<!-- ARTICOL FINAL -->)

Next time Use this GENERIC FORMULA:

(?s)(?-i:BSR|(?!\A)\G).*?\K(?<=FR1)(.*?)(?=FR2)(?!ESR)

BSR = <!-- ARTICOL START -->

FR1 = </table>\s

FR2 = \s<p class="text_obisnuit">

ESR = <!-- ARTICOL FINAL -->

Let BSR ( Begin Search-region Regex ) be the regex which defines the beginning of the area where the search for FR, must start

Let ESR ( End Search-region Regex) be the regex which defines, implicitly, the area where the search for FR, must end

Let FR (Find Regex ) be the regex which defines the char, string or expression to be searched

Let RR (Replacement Regex ) be the regex which defines the char, string or expression which must replace the FR expression. REPLACE RR (means can be \1 \2 ...or $1 $2 or other

0

I find myself a solution:

FIND: </table>\r\n\K<.*?>|(?s)\R\R(?=<p class="text_obisnuit">)

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .