1

I have these 2 lines on more than 3000 HTML pages:

<link rel="canonical" href="https://mywebsite.com/hi/about.html" />

and

<link rel="canonical" href="https://mywebsite.com/about.html" />

So, I want to find with regex all those pages that contain those lines which DO NOT contain this word hi from the /hi/ link.

1
  • Is hi always between mywebsite.com/ and /about.html or can it be anywhere in the url?
    – Toto
    Commented May 25, 2020 at 17:08

1 Answer 1

1

If the /hi/ is always after https://mywebsite.com you can use a negative lookahead to make sure you exclude those matches. In that case,

<link rel="canonical" href="https:\/\/mywebsite\.com\/(?!hi\/)

might work for you (demo). The first part is just a literal match (the backslashes are necessary for escaping, IIRC) and the (?!hi\/) is the negative lookahead: it makes sure the hi\/ does not occur. But Regex101 does a better job of explain the regex than I can.

(I assume you're familiar with the Notepad++ capabilities for mass search, but if not, this link may help.)

0

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .