2

Shell is: GNU bash, version 5.1.16(1)-release (x86_64-pc-linux-gnu)

In the current working directory, there are two files:

  • a file named abc.txt
  • a file named 'a'bc.txt (created with touch \'a\'bc.txt)

I run the following command:

echo 'a'*

The output is:

abc.txt

The GNU bash manual specifies that quote removal is processed AFTER pathname expansion.

Therefore, I expected this command to match 'a'bc.txt but NOT to match abc.txt. !

I expected the above command to proceed as follows:

  • at the pathname expansion stage, try to match any file with a filename that starts with 'a' ('a' taken as a literal string), and THUS match 'a'bc.txt
  • at the quote removal stage, remove the single quotes ' in 'a' without impacting the results of the pathname expansion that took place at the previous step.

There's obviously something I don't understand here.

I could NOT find any documentation or answers to this specific question.

2
  • I can't point to a specific place in the manual, and I fear I won't explain this clearly. For echo 'a'*, even before the quote removal stage, the shell doesn't see the quotes as literal single quote characters. The quoted string 'a' is one character long, not three. To get literal single quotes, you have to quote them or escape them. Commented Mar 9 at 1:56
  • Single and double quotes quote each other. So "'a'"* and "'"a"'"* both match: the difference is that the first contains a literal a, but the second would expand the a if this was a wildcard like ? or *. Commented Mar 9 at 9:52

1 Answer 1

1

The way you describe it would make it impossible to match filenames that e.g. start with a *. As it currently works, one can write '*'*, where the first asterisk is quoted, and as such, a literal, while the second retains the special meaning of matching anything. If the quotes (and symmetrically, backslashes) themselves would require matching characters to be found in the filename, that would be impossible.


I'm not sure how the internal implementation of the shell works, or what the history behind the phrase "quote removal" is, but I find it best to consider the state of being quoted a (hidden) property of the characters, instead of thinking the quote characters actually being there as separate entities after the command line was initially processed.

So, when you write '*'*, you get **, where the (hopefully at least barely visible) bolding marks the character as being quoted. Then, if you want to match a literal quote, you need to quote or escape that. E.g. "'*'"* would give '*'*, i.e. quoted quote-asterisk-quote, and a normally-special asterisk. (I think I've heard that some early shell implementation used the 8th bit of the byte to mark quoted characters, but of course that only works with 7-bit charsets.)

Or, if you like, just think of the quotes as special characters that are only used for determining if another character is quoted or not, and not for matching against characters in the target string.

1
  • My initial understanding: Until quote removal happens, 'string' remains a blackbox. Upon quote removal, the shell analyzes string (kept in between ' ') and treats it as a literal. My new understanding: The treatment of string as a literal is enforced throughout the whole expansion process until quote removal. 'a'bc.txt is not matched because the 'a' is not treated as a literal sequence to be matched against. 'a' indicates that a should be treated as a literal. The quotes are removed after they have served their purpose in controlling the interpretation of this string. Commented Mar 9 at 16:08

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .