bash - print regex captured groups

Question

I have a file.xml so composed:

...some xml text here...
    <Version>1.0.13-alpha</Version>
...some xml text here...

I need to extract the following information:

mayor_and_minor_release_number --> 1.0
patch_number --> 13
suffix --> -alpha

I've thought the cleanest way to achieve that is by mean of a regex with grep command:

<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>

I've checked with regex101 the correctness of this regex and actually it seems to properly capture the 3 fields I'm looking for. But here comes the problem, since I have no idea how to print those fields.

cat file.xml | grep "<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>" -oP

This command prints the entire line so it's quite useless.

Several posts on this site have been written about this topic, so I've also tried to use the bash native regex support, with poor results:

regex="<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>"
txt=$(cat file.xml)
[[ "$txt" =~ $regex ]]     --> it fails!
echo "${BASH_REMATCH[*]}"

I'm sorry but I cannot figure out how to overtake this issue. The desired output should be:

1.0
13
-alpha

@Shawn Yes you're right. I've forgotten to mention I cannot use any tool other than pure bash — dcfg, Commented Nov 12, 2020 at 17:57
Once again a user is asking for regex help and regex tag was removed. I am restoring it again but this is not good for SO. — anubhava, Commented Nov 12, 2020 at 18:21
@LéaGris generally speaking, you're right. But, in my opinion, parsing a single tag in a whole xml files is not a task which cannot be carried out by bash — dcfg, Commented Nov 12, 2020 at 18:39

anubhava · Accepted Answer · 2020-11-12 18:00:32Z

1

You may use this read + sed solution with similar regex as your's:

read -r major minor suffix < <(
sed -nE 's~.*<Version>([0-9]+\.[0-9]+)\.([0-9]+)(-[^<]*)</Version>.*~\1 \2 \3~p' file.xml
)

Check variable contents:

declare -p major minor suffix

declare -- major="1.0"
declare -- minor="13"
declare -- suffix="-alpha"

Few points:

You cannot use \d without using -P (perl) mode in grep
grep command doesn't return capture groups

answered Nov 12, 2020 at 18:00

anubhava

777k66 gold badges589 silver badges659 bronze badges

1

It works like a charm.Thank you. I know very little about sed. However, I'd like to know whether it is possible to achieve same result using just RE_MATCHES.
– dcfg
Commented Nov 12, 2020 at 18:09
1

Without using \d we can use <Version>([0-9]+\.[0-9]+)\.([0-9]+)(-[^<]*)</Version> as regex in bash and get captured groups in "${BASH_REMATCH[@]}" but bash regex is more suitable for matching against a string. If we are trying to find a match in a file then we'll need to match in a loop or read full file in a string like data="$(<file.xml)"
– anubhava
Commented Nov 12, 2020 at 18:14

Add a comment |

Timur Shtatland · Accepted Answer · 2020-11-12 19:09:22Z

Use this Perl one-liner:

perl -lne 'print for m{<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>};' file.xml

Example:

echo '<Version>1.0.13-alpha</Version>' | perl -lne 'print for m{<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>};'

Output:

1.0
13
-alpha

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.

SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

Collectives™ on Stack Overflow

bash - print regex captured groups

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
regex
linux
bash
grep
pattern-matching
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged regexlinuxbashgreppattern-matching or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
regex
linux
bash
grep
pattern-matching
or ask your own question.