Notepad++ and xml - replacing </ in closing element tag

Question

I have an XML file (1000s of records, simplified here), structure (e.g. say):

<LIST>
<ITEM_0>
<NAME>Item Name</NAME>
</ITEM_0>
...
<ITEM_9999>
<NAME>Item Name</NAME>
</ITEM_9999>
</LIST>

I need result:

<LIST>
<ITEM>
<ID>0</ID>
<NAME>Item Name</NAME>
</ITEM>
...
<ITEM>
<ID>9999</ID>
<NAME>Item Name</NAME>
</ITEM>
</LIST>

Using Regex:

Find: \<ITEM_(.*)(>)
Replace: ITEM>\n<ID>\1\</ID>

I get:

<LIST>
<ITEM>
<ID>0</ID>
<NAME>Item Name</NAME>
</ITEM>
<ID>0</ID> <-- This line not wanted
...
<ITEM>
<ID>9999</ID>
<NAME>Item Name</NAME>
</ITEM>
<ID>9999</ID> <-- This line not wanted
</LIST>

It's replacing </ITEM> as well even though (I think) I'm asking it to only replace <ITEM>- what am I doing wrong/how to fix? I may be missing something regarding grouping (or 'greedy'?) but not sure what and have looked all over for similar. There's a million ways to cut and dice it with something else, but it just bugs me getting so close but not there with NPP.

Help appreciated- thanks.

Late Edit: Even if I get the 1st replace to work right, just the <ITEM_#> tag, I'm still left with the </ITEM_#> closing tag as another search/replace operation. The problem here is the current operation replaces both the <ITEM and </ITEM tags...

Why not do a regular replace and replace the </ITEM_ with something else and then run your regex replacement? — Blerg, Commented Aug 1, 2016 at 20:02
Yes, thanks, would work but take 2 replaces, whereas x2 search/replace in 1 regex solution below works OK (but with the Q there still outstanding). — Catch21, Commented Aug 2, 2016 at 21:17

grawity_u1686 · Accepted Answer · 2016-07-31 11:35:00Z

0

Yes, it's likely that the .* is too "greedy" and captures as many characters as it can; you need the opposite – the shortest possible match instead.

One method would be to use [^>]* instead – this would still match as many as possible, but only until the first >, so <ITEM_([^>]*)> would only match the opening tag and nothing more.

Depending on regex syntax, .*? might also work – this explicitly switches the * to "non-greedy".

answered Jul 31, 2016 at 11:35

grawity_u1686

464k66 gold badges973 silver badges1.1k bronze badges

Add a comment |

Community · Accepted Answer · 2017-05-23 12:41:48Z

Thanks grawity, it helped me broaden my search to here to cover multiple search and replace in one regex.

Trying the following works:

Find: </ITEM_.*(>)|<ITEM_(.*)(>)
Replace: (?1</ITEM>)(?2<ITEM>\n<ID>\2</ID>)
RegEx

The | separates 2 strings looked for and the ?1 and ?2 are their respective replacements.

But I have to look for the closing </ITEM tag first, not the <ITEM tag as you would logically figure. So I have a solution, but can anyone answer the question as to why the above works but the following, looking for <ITEM tag first, fails when we're just reversing the order in which we look?

Find: <ITEM_(.*)(>)|</ITEM_.*(>)
Replace: (?1<ITEM>\n<ID>\1</ID>)(?2</ITEM>
RegEx

Not essential, but enquiring minds might like to know. Thanks.

Toto · Accepted Answer · 2020-07-27 13:23:56Z

Ctrl+H
Find what: <ITEM_(\d+)>([\s\S]*)</ITEM_\1>
Replace with: <ITEM>\n<ID>$1</ID>$2</ITEM>
CHECK Match case
CHECK Wrap around
CHECK Regular expression
UNCHECK . matches newline
Replace all

Explanation:

<ITEM_          # literally
(\d+)           # group 1, 1 or more digits, you can use [^>]* if other characters than digits are allowed
>               # literally
([\s\S]*)       # group 2, 0 or more any character, including linebreaks
</ITEM_         # literally
\1              # backreference to group 1
>               # literally

Replacement:

<ITEM>          # literally
\n              # linefeed, use \r\n for windows EOL
<ID>$1</ID>     # ID tag, with the content of group 1
$2              # content of group 2
</ITEM>         # literally

Screenshot (before):

Screenshot (after):

Stack Exchange Network

Notepad++ and xml - replacing </ in closing element tag

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
notepad++
regex
find-and-replace
xml
multiple-instances
.

Hot Network Questions

Notepad++ and xml - replacing </ in closing element tag

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged notepad++regexfind-and-replacexmlmultiple-instances.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
notepad++
regex
find-and-replace
xml
multiple-instances
.