Is the bash operator =~ equivalent to a perl invocation?


With bash test:

if [[ "$filename" =~ $regex ]]; then echo "it matches"; else echo "doesn't match"; fi
# doesn't match

if [[ "$filename" =~ ([^.]+)(-\d{1,5})(\.csv) ]]; then echo "matches"; else echo "doesn't match"; fi
# doesn't match

With perl:

result="$(perl -e "if ('$filename' =~ /$regex/) { exit 0;} else { exit 1;} ")"
if [[ result ]]; then echo "it matches"; else echo "doesn't match"; fi
# it matches

Is there anything I am missing for the bash =~ operator? Does this have something to do with the greedy vs non-greedy iterator ([^.]+)?


There are several different types of Regular Expression, each one adding more operators (and therefore requiring more characters to be escaped if they are to be considered literals).

The =~ operator is described in the documentation (see man bash on your system or online) like this,

An additional binary operator, =~, is available, with the same precedence as == and !=. When it is used, the string to the right of the operator is considered a POSIX extended regular expression and matched accordingly

An Extended Regular Expression (ERE) can be matched with grep -E (formerly egrep). Your example is a Perl Compatible Regular Expression (PCRE), which is a superset of the ERE and will not work with =~. However, it can be trivially adapted by replacing \d with [[:digit:]]:

echo abc-123.csv | grep -E '([^.]+)(-\d{1,5})(\.csv)'             # ERE fails
echo abc-123.csv | grep -P '([^.]+)(-\d{1,5})(\.csv)'             # PCRE matches with GNU grep

echo abc-123.csv | grep -E '([^.]+)(-[[:digit:]]{1,5})(\.csv)'    # ERE matches modified expression

So, given that grep -E is equivalent to =~ we can therefore write this,

if [[ "$filename" =~ ([^.]+)(-[[:digit:]]{1,5})(\.csv) ]]
    echo "matches"
    echo "doesn't match"

Note that your ERE should probably be prefixed with ^ and suffixed with $, and the [^.]+ adapted to [^-.]+ to ensure that you can't match strings such as abc-def-12345678-123.csv.txt:


If you're absolutely set on using a PCRE rather than an ERE you will have to use an external tool such as the GNU implementation of grep to perform the match. But this is less efficient, and the same advice about bounding applies here as is given above:

if echo "$filename" | grep -qP '([^.]+)(-\d{1,5})(\.csv)'
    echo "matches"
    echo "doesn't match"

The POSIX reference for basic REs (RE or BRE) and EREs is at https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html, and the reference for Perl REs (PCRE) is at https://www.pcre.org/original/doc/html/pcrepattern.html. Be warned that neither is the easiest of documentation to understand.

Finally, you ask,

Does this have something to do with the greedy vs non-greedy iterator ([^.]+)?

That isn't a greedy/non-greedy iterator. [^.]+ is greedy and means "one or more of anything except a dot (.)". EREs do not have non-greedy operators. PCREs can define a non-greedy operator such as * or + by following it with ?. For example contrast a* and a*?; the first will match as many a characters as possible and the second will match as few as possible.

The ( … ) bracket is a grouping, not a greediness indicator.

The operator =~ in Bash shell is equivalent to grep -E GNU command. Perl regex are not recognized with it. You need to do something like :

~$ [ $(echo "$filename" | grep -Po "$regex") ] && echo "it matches" || echo "does not match"
it matches

to have an equivalent.

About grep options used :

-o, --only-matching       show only the part of a line matching PATTERN
-P, --perl-regexp         PATTERN is a Perl regular expression

With your original form this looks like :

if [[ $(echo "$filename" | grep -Po "$regex") ]]; then echo "it matches"; else echo "does not match"; fi

This works too :

if [ $(echo "$filename" | grep -Po "$regex") ]; then echo "it matches"; else echo "does not match"; fi

You have also the possibility to do :

yyy@xxx:~$ filename="test-33.csv"
yyy@xxx:~$ regex="([^.]+)(-\d{1,5})(\.csv)"
yyy@xxx:~$ result=$(echo "$filename" | grep -Po "$regex")
yyy@xxx:~$ if [[ $result ]]; then echo "it matches"; else echo "does not match"; fi
it matches
Bash has extended glob patterns which get closer to regular expressions. Within [[...]] the == operator does glob-style pattern matching.

# one or more non-dots, a hyphen, a digit, optionally 4 more digits, the extension
[[ $filename == $pattern ]] && echo Y || echo N

If you're using the regex to filter a list of filenames, use the glob pattern in a for loop instead.

shopt -s extglob
for file in $pattern; do
    # do something with the file.
    echo "$file"


  • the shopt command: extended glob is automatically enabled within [[...]] but not otherwise.
  • $pattern is specifically unquoted in these code snippets so that it gets handled as a pattern not a literal string.

