1

I have an XML file with the following structure:

<id>1</id>
<name>alligator and stingray</name>
...
<id>99999</id>
<name>dolphin with carp</name>

I need result:

<id>1</id>
<name>Alligator And Stingray</name>
...
<id>99999</id>
<name>Dolphin With Carp</name>

I used this regex:

 Search: (<name>)(.*)(</name>)
 Replace: \1\u\2\3

The results I received:

<id>1</id>
<name>Alligator and stingray</name>
...
<id>99999</id>
<name>dolphin with carp</name>

It only capitalizes on the first word of the first ID, and the remaining words and words on other IDs remain unchanged (still lowercase)!

I was doing something wrong?

Help appreciated - Thank you.

1
  • Depending on the structure of the file, this might not be possible. Freddy's answer is not context-aware, so it would match every word regardless of what tag it is in. You should consider using a parser.
    – chaosflaws
    Commented May 15, 2019 at 11:37

3 Answers 3

3
  • Ctrl+H
  • Find what: (?:<name>|\G)\K\b(\w)(\w+\s*)
  • Replace with: \u$1$2
  • check Match case
  • check Wrap around
  • check Regular expression
  • Replace all

Explanation:

(?:<name>|\G)   # non capture group, "<name>" or restart from previous match position
\K              # forget all we have seen until this position
\b              # word boundary
(\w)            # group 1, 1 word character
(\w+\s*)        # group 2, 1 or more word characters followed by optional space

Replacement:

\u$1        # uppercase content of group 1 (i.e. the first letter)
$2          # content of group 2 (i.e. the rest of the word)

Result for given example:

<id>1</id>
<name>Alligator And Stingray</name>
...
<id>99999</id>
<name>Dolphin With Carp</name>

Screen capture:

enter image description here

1
  • Your answer also encountered a non-capitalization of the same unicode characters of @Freddy above! But I appreciate it because it overcomes fix "block overflow" (it capitalization text in <name> only) for the answer of @Freddy. Commented May 18, 2019 at 3:18
1

Try this:

   Find what: ([>\s])([a-z])
Replace with: \1\u\2

This changes a lower case character to upper case if the previous character is a space character or >.

1
  • 1
    It didn't capitalize unicode letters (eg, ă, â, đ, ê, ...) and it also capitalized the words of other tags not only <name> (I just hope it capitalized for unique tag <name> and other tags aren't needed). Commented May 16, 2019 at 15:44
0

Edit: Use @Toto's answer. This one does not use non-regular language features and can, therefore, never answer the question completely, although it does solve the finite length cases (now).


What you are trying to do is only possible if there is a maximum number of words in a <name>...</name> block.

The problem with your current regex is that group \2 applies to the whole of the text inside the tags (alligator and stingray) in your example and \u operates only on the character immediately following it.

If there is a maximum number of words in a node, you may use a regex similar to this one:

Find what: <name>(\w)(\w* ?)(\w?)(\w*? ?)(\w?)(\w*? ?)</name>
Replace with: <name>\U\1\E\2\U\3\E\4\U\5\E\6</name>

If you don't know how many words are in one node, you should use an XML Parser instead.

1
  • Yes, I want to capitalize the first letter of all the words in the <name>...</name> block, and it hasn't same length between ids, will have ids with long words, and vice versa. Commented May 16, 2019 at 15:49

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .