0

I have this tags, and I want to replace the content of a tag (link-one) with the content of another tag (link two). So, to extract the content of a tag and replace it in another tag. For example

<link rel="canonical" href="https://mywebsite.com/link-one.html" />

<a href="/en/link-two.html">

The output should be:

<link rel="canonical" href="https:/mywebsite.com/link-two.html" />

I made a regex, that almost works, but it must be a little bit improved..

Search:

(<link rel="canonical" href=").*?(" \/>)(.*?)(<a href="/en/(.*?)">)

Replace by:

\1\5\3\4

(check .Matches newline)

2 Answers 2

1
  • Ctrl+H
  • Find what: <link rel="canonical" href="https://mywebsite.com/\K.*?(?=" />.+?<a href="/en/(.*?)">)
  • Replace with: $1
  • CHECK Wrap around
  • CHECK Regular expression
  • CHECK . matches newline
  • Replace all

Explanation:

<link rel="canonical" href="https://mywebsite.com/
\K                  # forget all we have seen  until this position
.*?                 # 0 or more any character, not greedy
(?=                 # positive lookahead
  " />              # literally
  .+?               # 0 or more any character, not greedy
  <a href="/en/     # literally
  (.*?)             # group 1, 0 or more any character, not greedy
  ">                # literally
)                   # end lookahead

Replacement:

$1          # content of group 1

Screenshot (before):

enter image description here

Screenshot (after):

enter image description here

1
  • nice work, thanks Toto. Please add a two // after https: . I also edit the https: on my link
    – Just Me
    Commented Mar 14, 2020 at 19:06
0

For a simple scenario, such as:

   <link rel="canonical" href="link-one.html" />

   <a href="/en/link-two.html">

(check .Matches newline)

FIND: (<link rel="canonical" href=").*?(" \/>)(.*?)(<a href="/en/(.*?)">)

Replace by: \1\5\2\3\4


For a second scenario, such as:

   <link rel="canonical" href="https:/mywebsite.com//link-one.html" />

   <a href="/en/link-two.html">

(check .Matches newline)

FIND: (<link rel="canonical" href=").*?(" \/>)(.*?)(<a href="/en/(.*?)">)

Replace by: \1https://mywebsite.com/\5\2\3\4

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .