1

I have a shell variable with a string like the following "PA-232 message1 GX-1234 message2 PER-10 message3"

I need to apply a regular expression that detects PA-232, GX-1234 and PER-10 and returns these ocurrences.

How can I do this in bash?

I've tried this:

echo "PA-232 message1 GX-1234 message2 PER-10 message3" | sed -r 's/^.*([A-Z]+-[0-9]+).*$/\1/'

But it returns

R-10

instead of

PA-232
GX-1234
PER-10

2 Answers 2

2
$ echo "PA-232 message1 GX-1234 message2 PER-10 message3" | grep -Eo '[A-Z]+-[0-9]+'
PA-232
GX-1234
PER-10
2

A Perl:

$ echo "PA-232 message1 GX-1234 message2 PER-10 message3" | perl -lnE 'while(/([A-Z]+-\d+)/g){ say $1 }'

Or awk:

$ echo "PA-232 message1 GX-1234 message2 PER-10 message3" | awk '{for(i=1;i<=NF;i++) if($i~"^[A-Z]+-[[:digit:]]+$") print $i}'

Or a pure Bash regex + loop:

str="PA-232 message1 GX-1234 message2 PER-10 message3" 

while [[ $str =~ ([^A-Z]*)([A-Z]+-[[:digit:]]+)([  \t]+|$) ]]; do
    echo "${BASH_REMATCH[2]}"       # the field
    i=${#BASH_REMATCH}              # length of field + not the field on both sides
    str=${str:i}                    # advance the string by that length
done                                # the loop deletes $str, so make a copy if needed

Any of those prints:

PA-232
GX-1234
PER-10

Not the answer you're looking for? Browse other questions tagged or ask your own question.