1

I have a file.xml so composed:

...some xml text here...
    <Version>1.0.13-alpha</Version>
...some xml text here...

I need to extract the following information:

  • mayor_and_minor_release_number --> 1.0
  • patch_number --> 13
  • suffix --> -alpha

I've thought the cleanest way to achieve that is by mean of a regex with grep command:

<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>

I've checked with regex101 the correctness of this regex and actually it seems to properly capture the 3 fields I'm looking for. But here comes the problem, since I have no idea how to print those fields.

cat file.xml | grep "<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>" -oP

This command prints the entire line so it's quite useless.

Several posts on this site have been written about this topic, so I've also tried to use the bash native regex support, with poor results:

regex="<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>"
txt=$(cat file.xml)
[[ "$txt" =~ $regex ]]     --> it fails!
echo "${BASH_REMATCH[*]}"

I'm sorry but I cannot figure out how to overtake this issue. The desired output should be:

1.0
13
-alpha
4
  • @Shawn Yes you're right. I've forgotten to mention I cannot use any tool other than pure bash
    – dcfg
    Commented Nov 12, 2020 at 17:57
  • 2
    stackoverflow.com/questions/8268138/… might be helpful.
    – Shawn
    Commented Nov 12, 2020 at 18:19
  • 3
    Once again a user is asking for regex help and regex tag was removed. I am restoring it again but this is not good for SO.
    – anubhava
    Commented Nov 12, 2020 at 18:21
  • 4
    @LéaGris generally speaking, you're right. But, in my opinion, parsing a single tag in a whole xml files is not a task which cannot be carried out by bash
    – dcfg
    Commented Nov 12, 2020 at 18:39

2 Answers 2

1

You may use this read + sed solution with similar regex as your's:

read -r major minor suffix < <(
sed -nE 's~.*<Version>([0-9]+\.[0-9]+)\.([0-9]+)(-[^<]*)</Version>.*~\1 \2 \3~p' file.xml
)

Check variable contents:

declare -p major minor suffix

declare -- major="1.0"
declare -- minor="13"
declare -- suffix="-alpha"

Few points:

  • You cannot use \d without using -P (perl) mode in grep
  • grep command doesn't return capture groups
2
  • 1
    It works like a charm.Thank you. I know very little about sed. However, I'd like to know whether it is possible to achieve same result using just RE_MATCHES.
    – dcfg
    Commented Nov 12, 2020 at 18:09
  • 1
    Without using \d we can use <Version>([0-9]+\.[0-9]+)\.([0-9]+)(-[^<]*)</Version> as regex in bash and get captured groups in "${BASH_REMATCH[@]}" but bash regex is more suitable for matching against a string. If we are trying to find a match in a file then we'll need to match in a loop or read full file in a string like data="$(<file.xml)"
    – anubhava
    Commented Nov 12, 2020 at 18:14
1

Use this Perl one-liner:

perl -lne 'print for m{<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>};' file.xml

Example:

echo '<Version>1.0.13-alpha</Version>' | perl -lne 'print for m{<Version>(\d+\.\d+)\.(\d+)([\w-]+)?<\/Version>};'

Output:

1.0
13
-alpha

The Perl one-liner uses these command line flags:
-e : Tells Perl to look for code in-line, instead of in a file.
-n : Loop over the input one line at a time, assigning it to $_ by default.
-l : Strip the input line separator ("\n" on *NIX by default) before executing the code in-line, and append it when printing.

SEE ALSO:
perldoc perlrun: how to execute the Perl interpreter: command line switches

Not the answer you're looking for? Browse other questions tagged or ask your own question.