10

I want to determine whether a multi-line string ends with a line containing specified pattern.

These code failed, it doesn't match.

s=`echo hello && echo world && echo OK`
[[ "$s" =~ 'OK$' ]] && echo match

3 Answers 3

16

In bash 3.2 or above and if the compatibility to 3.1 is not enabled (with the compat31 option or BASH_COMPAT=3.1), quoting regular expression operators (not only with \ but with any of the bash quoting operators ('...', "...", $'...', $"...")) removes their special meaning.

[[ $var =~ 'OK$' ]]

matches only in strings that contain OK$ literally (that $ matches a literal $)

[[ $var =~ OK$ ]]

matches on strings that end in OK (that $ is the RE operator that matches at the end of the string).

That also applies to regexps stored in variables or the result of some substitution.

[[ $var =~ $regexp ]]   # $var matches $regexp
[[ $var =~ "$string" ]] # $var contains $string

Note that it can become awkward because there are some characters that you need to quote for the shell syntax (like blanks, <, >, &, parenthesis when not matched). For instance, if you want to match against the .{3} <> [)}]& regexp (3 characters followed by a " <> ", either a ) or } and a &), you need something like:

[[ $var =~ .{3}" <> "[}\)]\& ]]

If in doubt about which characters need quoting, you can always use a temporary variable. That also means it will make the code compatible to bash31, zsh or ksh93:

pattern='.{3} <> [})]&'
[[ $var =~ $pattern ]] # remember *not* to quote $pattern here

That's also the only way (short of using the compat31 option (or BASH_COMPAT=3.1)) you can make use of the non-POSIX extended operators of your system's regexps.

For instance, for \< to be treated as the word boundary which it is in many regexp engines, you need:

pattern='\<word\>'
[[ $var =~ $pattern ]]

Doing:

[[ $var =~ \<word\> ]]

won't work as bash treats those \ as shell quoting operators and strip them before passing <word> to the regexp library.

Note that it's a lot worse in ksh93 where:

[[ $var =~ "x.*$" ]]

for instance will match on whatever-xa* but not whatever-xfoo. The quoting above removes the special meaning to *, but not to . nor $.

The zsh behaviour is simpler: quoting doesn't change the meaning of regexp operators there (like in bash31) which makes for a more predictable behaviour (it can also use PCRE regexps instead of ERE (with set -o rematchpcre)).

yash doesn't have a [[...]] construct, but its [ builtin has a =~ operator (also in zsh). And of course, [ being a normal command, quoting can't affect the way regexp operators are interpreted.


Also note that strictly speaking, your $s doesn't contain 3 lines, but 2 full lines followed by an unterminated line. It contains hello\nworld\nOK. In the OK$ extended regular expression, the $ operator would only match at the end of the string.

In a 3-full-lines string, like hello\nworld\nOK\n (which you wouldn't be able to obtain with command substitution as command substitution strips all trailing newline characters), the $ would match after the \n, so OK$ wouldn't match on it.

With zsh -o pcrematch however, the $ matches both at the end of the string and before the newline at the end of the string if there's one as it doesn't pass the PCRE_DOLLAR_ENDONLY flag to pcre_compile. That could be seen as a bad idea as generally, variables in shells do not contain a trailing newline character, and when they do, we generally want them considered as data.

3

At least in bash, quoting the RHS forces it to be treated as a string comparison

$ s=$(printf 'hello\nworld\nOK\n')
$ echo "$s"
hello
world
OK
$ [[ "$s" =~ OK$ ]] && echo "match" || echo "no match"
match

whereas

$ s=$(printf 'hello\nworld\nOK$\n')
$ echo "$s"
hello
world
OK$
$ [[ "$s" =~ 'OK$' ]] && echo "match" || echo "no match"
match
2
  • Wow, Such a quirk! I do use bash.
    – gzc
    Commented Jun 17, 2017 at 15:42
  • The second case should output "no match".
    – gzc
    Commented Jun 17, 2017 at 15:42
1

Little known fact: case does this, too.

case "$(printf 'hello\nworld\nOK\n')" in
  *$'\nOK') echo "match";;
  *) echo "no match";;
esac

The $'...' "C-style" string is a Bash extension (which provides a context where backslash escape codes like \n are available in shell strings), but for portabiity, you can say

*"
OK") echo "match";;

to get a completely POSIX-compatible shell script.

The patterns available in a case statement are shell glob patterns, not proper regular expressions, though.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .