3

Is the bash operator =~ equivalent to a perl invocation?

filename="test-33.csv"
regex="([^.]+)(-\d{1,5})(\.csv)"

With bash test:

if [[ "$filename" =~ $regex ]]; then echo "it matches"; else echo "doesn't match"; fi
# doesn't match

if [[ "$filename" =~ ([^.]+)(-\d{1,5})(\.csv) ]]; then echo "matches"; else echo "doesn't match"; fi
# doesn't match

With perl:

result="$(perl -e "if ('$filename' =~ /$regex/) { exit 0;} else { exit 1;} ")"
if [[ result ]]; then echo "it matches"; else echo "doesn't match"; fi
# it matches

Is there anything I am missing for the bash =~ operator? Does this have something to do with the greedy vs non-greedy iterator ([^.]+)?

1

3 Answers 3

9

There are several different types of Regular Expression, each one adding more operators (and therefore requiring more characters to be escaped if they are to be considered literals).

The =~ operator is described in the documentation (see man bash on your system or online) like this,

An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered a POSIX extended regular expression and matched accordingly

An Extended Regular Expression (ERE) can be matched with grep -E (formerly egrep). Your example is a Perl Compatible Regular Expression (PCRE), which is a superset of the ERE and will not work with =~. However, it can be trivially adapted by replacing \d with [[:digit:]]:

echo abc-123.csv | grep -E '([^.]+)(-\d{1,5})(\.csv)'             # ERE fails
echo abc-123.csv | grep -P '([^.]+)(-\d{1,5})(\.csv)'             # PCRE matches with GNU grep

echo abc-123.csv | grep -E '([^.]+)(-[[:digit:]]{1,5})(\.csv)'    # ERE matches modified expression

So, given that grep -E is equivalent to =~ we can therefore write this,

if [[ "$filename" =~ ([^.]+)(-[[:digit:]]{1,5})(\.csv) ]]
then
    echo "matches"
else
    echo "doesn't match"
fi

Note that your ERE should probably be prefixed with ^ and suffixed with $, and the [^.]+ adapted to [^-.]+ to ensure that you can't match strings such as abc-def-12345678-123.csv.txt:

^[^-.]+-[[:digit:]]{1,5}\.csv$

If you're absolutely set on using a PCRE rather than an ERE you will have to use an external tool such as the GNU implementation of grep to perform the match. But this is less efficient, and the same advice about bounding applies here as is given above:

if echo "$filename" | grep -qP '([^.]+)(-\d{1,5})(\.csv)'
then
    echo "matches"
else
    echo "doesn't match"
fi

The POSIX reference for basic REs (RE or BRE) and EREs is at https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html, and the reference for Perl REs (PCRE) is at https://www.pcre.org/original/doc/html/pcrepattern.html. Be warned that neither is the easiest of documentation to understand.

Finally, you ask,

Does this have something to do with the greedy vs non-greedy iterator ([^.]+)?

That isn't a greedy/non-greedy iterator. [^.]+ is greedy and means "one or more of anything except a dot (.)". EREs do not have non-greedy operators. PCREs can define a non-greedy operator such as * or + by following it with ?. For example contrast a* and a*?; the first will match as many a characters as possible and the second will match as few as possible.

The ( … ) bracket is a grouping, not a greediness indicator.

2
  • "each one adding more operators (and therefore requiring more characters to be escaped" -- except that BRE has some characters that are special exactly when "escaped" by backslashes, mainly \{/\} and \(/\) (but not \+or \|).
    – ilkkachu
    Commented Jan 23 at 18:13
  • 2
    @ilkkachu yes I know, but I'm trying to keep a complex subject as simple as I dare Commented Jan 23 at 18:14
4

The operator =~ in Bash shell is equivalent to grep -E GNU command. Perl regex are not recognized with it. You need to do something like :

~$ [ $(echo "$filename" | grep -Po "$regex") ] && echo "it matches" || echo "does not match"
it matches

to have an equivalent.

About grep options used :

-o, --only-matching       show only the part of a line matching PATTERN
-P, --perl-regexp         PATTERN is a Perl regular expression

With your original form this looks like :

if [[ $(echo "$filename" | grep -Po "$regex") ]]; then echo "it matches"; else echo "does not match"; fi

This works too :

if [ $(echo "$filename" | grep -Po "$regex") ]; then echo "it matches"; else echo "does not match"; fi

You have also the possibility to do :

yyy@xxx:~$ filename="test-33.csv"
yyy@xxx:~$ regex="([^.]+)(-\d{1,5})(\.csv)"
yyy@xxx:~$ result=$(echo "$filename" | grep -Po "$regex")
yyy@xxx:~$ if [[ $result ]]; then echo "it matches"; else echo "does not match"; fi
it matches
yyy@xxx:~$
5
  • 3
    Instead of the [ testing non-empty output from grep (unquoted is bad), use grep -q -- echo "$filename | grep -qP "$regex" && echo match || echo no match Commented Jan 22 at 22:22
  • 1
    Would select this as the best answer. Could you please align your answer with the question... the form is if [[ expression ]]; then ...; else ...; fi
    – rellampec
    Commented Jan 23 at 10:08
  • Thank you ! It's done
    – hidigoudi
    Commented Jan 23 at 10:38
  • Thanks @hidigoudi . The other answer (edited) offers a more complete explanation. I like of yours that went straight to address the target result, though. Here some reference on the bash if statements that explains the PCRE solution of the other answer. By the way, option -q (silent) gets to bring the exit status code to the if condition.
    – rellampec
    Commented Jan 23 at 11:44
  • No problem, I understand, thank you for the vote up !
    – hidigoudi
    Commented Jan 23 at 12:47
0

Bash has extended glob patterns which get closer to regular expressions. Within [[...]] the == operator does glob-style pattern matching.

filename=test-33.csv
# one or more non-dots, a hyphen, a digit, optionally 4 more digits, the extension
pattern='+([^.])-[0-9]?([0-9])?([0-9])?([0-9])?([0-9]).csv'
[[ $filename == $pattern ]] && echo Y || echo N

If you're using the regex to filter a list of filenames, use the glob pattern in a for loop instead.

shopt -s extglob
for file in $pattern; do
    # do something with the file.
    echo "$file"
done

Notes

  • the shopt command: extended glob is automatically enabled within [[...]] but not otherwise.
  • $pattern is specifically unquoted in these code snippets so that it gets handled as a pattern not a literal string.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .