There are several different types of Regular Expression, each one adding more operators (and therefore requiring more characters to be escaped if they are to be considered literals).
The =~
operator is described in the documentation (see man bash
on your system or online) like this,
An additional binary operator, =~
, is available, with the same precedence as ==
and !=
. When it is used, the string to the right of the operator is considered a POSIX extended regular expression and matched accordingly
An Extended Regular Expression (ERE) can be matched with grep -E
(formerly egrep
). Your example is a Perl Compatible Regular Expression (PCRE), which is a superset of the ERE and will not work with =~
. However, it can be trivially adapted by replacing \d
with [[:digit:]]
:
echo abc-123.csv | grep -E '([^.]+)(-\d{1,5})(\.csv)' # ERE fails
echo abc-123.csv | grep -P '([^.]+)(-\d{1,5})(\.csv)' # PCRE matches with GNU grep
echo abc-123.csv | grep -E '([^.]+)(-[[:digit:]]{1,5})(\.csv)' # ERE matches modified expression
So, given that grep -E
is equivalent to =~
we can therefore write this,
if [[ "$filename" =~ ([^.]+)(-[[:digit:]]{1,5})(\.csv) ]]
then
echo "matches"
else
echo "doesn't match"
fi
Note that your ERE should probably be prefixed with ^
and suffixed with $
, and the [^.]+
adapted to [^-.]+
to ensure that you can't match strings such as abc-def-12345678-123.csv.txt
:
^[^-.]+-[[:digit:]]{1,5}\.csv$
If you're absolutely set on using a PCRE rather than an ERE you will have to use an external tool such as the GNU implementation of grep
to perform the match. But this is less efficient, and the same advice about bounding applies here as is given above:
if echo "$filename" | grep -qP '([^.]+)(-\d{1,5})(\.csv)'
then
echo "matches"
else
echo "doesn't match"
fi
The POSIX reference for basic REs (RE or BRE) and EREs is at https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_chap09.html, and the reference for Perl REs (PCRE) is at https://www.pcre.org/original/doc/html/pcrepattern.html. Be warned that neither is the easiest of documentation to understand.
Finally, you ask,
Does this have something to do with the greedy vs non-greedy iterator ([^.]+)
?
That isn't a greedy/non-greedy iterator. [^.]+
is greedy and means "one or more of anything except a dot (.
)". EREs do not have non-greedy operators. PCREs can define a non-greedy operator such as *
or +
by following it with ?
. For example contrast a*
and a*?
; the first will match as many a
characters as possible and the second will match as few as possible.
The ( … )
bracket is a grouping, not a greediness indicator.