13

I recently had trouble with some regex on the command-line, and found that for matching a backslash, different numbers of characters can be used. This number depends on the quoting used for the regex (none, single quotes, double quotes). See the following bash session for what I mean:

echo "#ab\\cd" > file
grep -E ab\cd file
grep -E ab\\cd file
grep -E ab\\\cd file
grep -E ab\\\\cd file
#ab\cd
grep -E ab\\\\\cd file
#ab\cd
grep -E ab\\\\\\cd file
#ab\cd
grep -E ab\\\\\\\cd file
#ab\cd
grep -E ab\\\\\\\\cd file
grep -E "ab\cd" file
grep -E "ab\\cd" file
grep -E "ab\\\cd" file
#ab\cd
grep -E "ab\\\\cd" file
#ab\cd
grep -E "ab\\\\\cd" file
#ab\cd
grep -E "ab\\\\\\cd" file
#ab\cd
grep -E "ab\\\\\\\cd" file
grep -E 'ab\cd' file
grep -E 'ab\\cd' file
#ab\cd
grep -E 'ab\\\cd' file
#ab\cd
grep -E 'ab\\\\cd' file

This means that:

  • with no quotes, I can match a backslash with 4-7 actual backslashes
  • with double quotes, I can match a backslash with 3-6 actual backslashes
  • With single quotes, I can match a backslash with 2-3 actual backslashes

I understand that one extra backslash is ignored by the shell (from the bash man page):

"A non-quoted backslash (\) is the escape character. It preserves the literal value of the next character that follows"

This does not apply to the single-quoted examples, because no escaping is done in single quotes.

And one additional backslash is ignored by the grep command ("\c" is just "c" escaped, but this is just the same as "c", because "c" does not have a special meaning in a regex).

This explains the behaviour of the example with single quotes, but I don't really understand the other two examples, especially why there is a difference between non-qouted an double-quoted strings.

Again, a quote from the bash man page:

"Enclosing characters in double quotes preserves the literal value of all characters within the quotes, with the exception of $, `, \, and, when history expansion is enabled, !."

I tried the same with GNU awk (e.g. awk /ab\cd/{print} file), with the same results.

Perl, however, shows different results (using e.g. perl -ne "/ab\\cd/"\&\&print file):

  • with no quotes, I can match a backslash with 4-5 actual backslashes
  • with double quotes, I can match a backslash with 3-4 actual backslashes
  • With single quotes, I can match a backslash with 2 actual backslashes

Can anyone explain that difference between non-quoted and double-qouted regex strings on the command-line for grep and awk? I'm not that interested in an explanation of Perl's behaviour, since I usually don't use Perl one-liners.

2 Answers 2

11

For the unquoted example, each \\ pair passes one backslash to grep, so 4 backslashes pass two to grep, which translates to a single backslash. 6 backslashes pass three to grep, translating to one backslash and one \c, which is equal to c. One additional backslash does not change anything, because it is translated \c -> c by the shell. Eight backslashes in the shell are four in grep, translated to two, so this does not match anymore.

For the example in double quotes, note what follows your second quote from the bash manpage:

The backslash retains its special meaning only when followed by one of the following characters: $, `, ", \, or newline.

I.e. when you give an odd number of backslashes, the sequence ends in \c, which would be equal to c in the unquoted case, but when quoted, the backslash looses its special meaning, so \c is passed to grep. That is why the range of "possible" backslashes (i.e. those that make up a pattern matching your example file) slides down by one.

3
  • ... and then there are some oddities: for exemple: printf "\ntest" will insert a newline before "test", even though "\n" should have been translated to "n" by the shell as it is whithin double quotes... (so the expected result should be, for "\ntest", "ntest". We should get the habit to write: printf "\\ntest" or printf '\ntest', but somehow I see a lot of script relying on the oddity instead. Commented May 28, 2018 at 13:35
  • 1
    @OlivierDulac: according to dash manual page: The backslash inside double quotes is historically weird, and serves to quote only the following characters: $ ` " \ <newline>. Otherwise it remains literal.
    – MoonSweep
    Commented Mar 3, 2023 at 16:13
  • @MoonSweep : indeed, at the time of the above comment I thought most "\X" would be translated to "X", but lots of time "\" is kept as-it (and then printf sees "\n" and prints a newline, as it should). I was confused about "\X" vs '\X', thinking they would almost always behave differently. Commented Mar 3, 2023 at 16:27
6

This link described bash Quotes and Escaping

Your question deals with the first three sections.

  • Per-character escaping
  • Weak quoting "double quotes"
  • Strong quoting 'single quotes'
  • ANSI C like string quoting
  • I18N/L10N quoting (Internationalization and Localization).

Below is a chart of how the strings as bash passes them on to grep and how grep further interprets them internally.

Lets first look at echo "#ab\\cd" > file.
In the weak-quoted ("") "#ab\\cd", the \\ is an escaped \ which is passed to file as a single literal \. So, file contains ab\cd

Now, to your commands: The chart below may help to see what actualy goes on with each call. The * shows the ones which match the file contents. It is really just a matter of applying bash's escape rules, as on the web page, with particular note to daniel kullmann`s answer where he refers to escaping behaviour in a weak-quoting situation.

The backslash retains its special meaning only when followed by one of the following characters: $, `, ", \, or newline.


                            bash passes    grep further
                            to grep        resolves to         
grep -E ab\cd file            abcd           abcd   
grep -E ab\\cd file           ab\cd          abcd  
grep -E ab\\\cd file          ab\cd          abcd
grep -E ab\\\\cd file         ab\\cd         ab\cd    * 
grep -E ab\\\\\cd file        ab\\\cd        ab\cd    *
grep -E ab\\\\\\cd file       ab\\\cd        ab\cd    *    
grep -E ab\\\\\\\cd file      ab\\\cd        ab\cd    *
grep -E ab\\\\\\\\cd file     ab\\\\cd       ab\\cd

grep -E "ab\cd" file          ab\cd          abcd
grep -E "ab\\cd" file         ab\cd          abcd
grep -E "ab\\\cd" file        ab\\cd         ab\cd    *
grep -E "ab\\\\cd" file       ab\\cd         ab\cd    *
grep -E "ab\\\\\cd" file      ab\\\cd        ab\cd    *
grep -E "ab\\\\\\cd" file     ab\\\cd        ab\cd    *
grep -E "ab\\\\\\\cd" file    ab\\\\cd       ab\\cd    

grep -E 'ab\cd' file          ab\cd          abcd  
grep -E 'ab\\cd' file         ab\\cd         ab\cd    *
grep -E 'ab\\\cd' file        ab\\\cd        ab\cd    *
grep -E 'ab\\\\cd' file       ab\\\\cd       ab\\cd

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .