5

I am attempting to write a simple bash parser. I am following the steps in this wiki. One of the assumptions I make is that I can do a pass over the entire input string and remove all single and double quotes if I escape the characters correctly. When run in the bash, they two strings should yield the same output.

This section of my parser takes any given string and removes single and double quotes from it (except quotes that are escaped and interpreted as literals). Both strings should still yield the same result when executed in bash.

My parser converts the original to My Parse like below. However, the original works, but My Parse does not.

# Original
$ node -p "console.log($(echo \"hello world\"))"
hello world


# My Parse: Escape everything within double quotes except command substitution
                                  v                      v
$ node -p \c\o\n\s\o\l\e\.\l\o\g\($(echo \"hello world\")\)
[eval]:1
console.log("hello
            ^^^^^^

SyntaxError: Invalid or unexpected token

I have several ideas about why my parsing is wrong.

  1. I am not understanding some fundamental aspect of how command substitution inside double quotes works. My understanding is command substitution occurs first, then quotes are processed.

  2. I am not understanding some fundamental aspect of how command substitution is actually output. My understanding is $(echo \"hello world\") should yield a string "hello world" not commands "hello and world"

  3. There is some special-ness with the echo command (potentially because it is variadic). I am actually getting lucky that this works in the original scenario, but actually, changing the command inside the command substitution could break it...

  4. There is a problem with my node / javascript problem. This is pretty simple js, so I don't think this is it...

One last interesting thing: It works when I wrap the command substitution in double quotes. Maybe to ask this whole question differently, how could I write the same input as below without the double quotes (excluding the escaped ones).

# Escape everything but keep command substitution in double quotes
                                  v                       v
$ node -p \c\o\n\s\o\l\e\.\l\o\g\("$(echo \"hello world\")"\)
hello world

Note: This question is somewhat a follow up to this question about escaping double quotes

3
  • 1
    Parsing a $(cmd) command substitution is not simple, it needs a recursive parser to do it right. Most shells do it wrong, even bash fails to do it right some cases. Currently bosh is the only shell with no known bug in parsing command substitutions. So do not believe that this is a simple task to implement...
    – schily
    Commented Sep 17, 2020 at 22:53
  • Totally agree it is not an easy task! I will look into bosh, thank you :)
    – falky
    Commented Sep 17, 2020 at 23:05
  • 1
    The recent source is inside schilytools and the important code is in name.c inside the function match_cmd(). This calls a recursiv parser that stops at the first supefluous ). BTW: the parser itself is in the function cmd(). The binary syntax tree that is the result from calling the parser is then converted back into text and used as the next "word" from the input. The execution of the code later happens in the file macro.c.
    – schily
    Commented Sep 18, 2020 at 10:31

3 Answers 3

3

This is Word Splitting in action. Before we start, peruse the Shell Expansions paying attention to the order in which they are performed.

Looking at node -p "console.log($(echo \"hello world\"))"

  • brace expansion? no

  • tilde expansion? no

  • parameter expansion? none here

  • command substitution? yes, leaving you with

      node -p "console.log("hello world")"
    
  • arithmetic expansion? no

  • process substitution? no

  • word splitting? the argument to -p is in quotes, so no.

  • filename expansion? no

  • quote removal is done

bash spawns node passing it 2 arguments, -p and console.log("hello world")


Now look at node -p console.log($(echo \"hello world\"))

  • after command substitution, we have node -p console.log("hello world")

  • when we get to word splitting, the argument to -p has no quotes to protect it. For the current command, bash has 4 tokens:

      node -p console.log("hello world")
      ^^^^ ^^ ^^^^^^^^^^^^^^^^^^ ^^^^^^^
    

bash spawns node passing it 3 arguments: -p, console.log("hello and world") -- console.log("hello is clearly a javascript syntax error, and you see what happens.


Lots more detail at Security implications of forgetting to quote a variable in bash/POSIX shells

4
  • Thank you @glen-jackman. This isolates the problem and was what I thought may be going on. However, when I execute node -p "console.log("hello world")" I still get the same syntax error... And yet the original command still works. Is there something else coming into play with the command substitution that I am missing?
    – falky
    Commented Sep 17, 2020 at 23:31
  • Double quotes do not nest. That's why you escaped the "inner" quotes in your question. You could use one pair of single quotes and one pair of double quotes. You are approaching "quoting hell" but not there yet. Commented Sep 18, 2020 at 0:42
  • I marked this as the answer. My key learning: When you are trying to parse out all quotes, if you have command substitution inside double quotes, first escape everything inside the double quotes except the command substitution, then perform the substitution as if the substitution was wrapped in double quotes. Then escape those quotes. e.g. open "$(echo ab c).txt" --> open "$(echo ab c)"\.\t\x\t --> open "ab c"\.\t\x\t --> open \a\b\ \c\.\t\x\t
    – falky
    Commented Sep 18, 2020 at 19:43
  • 1
    You'll want to do some research about word splitting and pathname expansion -- these are generally the effects you want to prevent by using quotes. Commented Sep 18, 2020 at 20:05
0

I don't know what "node -p" in your example is supposed to do. But trying to reproduce your example results in the following:

[user@c0n1 ~]# echo "hello world"
hello world
[user@c0n1 ~]# echo $(\"hello world\")
-bash: "hello: command not found

The white space in the last example breaks the string inside the quotes, & bash thinks that "hello" is a command. Protecting the white space with a backslash confirms that explanation:

[user@c0n1 ~]# echo $(\"hello\ world\")
-bash: "hello world": command not found

If you are trying to produce a string inside command substitution, you need a command to go with it, like this:

[user@c0n1 ~]# echo $(echo \"hello world\")
"hello world"
2
  • I agree with your points here, however, I don't think it quite answers my question. As you can see in my code blocks I am putting echo inside the command substitution e.g. $(echo \"hello world\")
    – falky
    Commented Sep 17, 2020 at 22:58
  • node -p mean run the given command and print the result. e.g. node -p "console.log('hello world')" and node -p \c\o\n\s\o\l\e\.\l\o\g\(\'\h\e\l\l\o\ \w\o\r\l\d\'\) both output hello world
    – falky
    Commented Sep 17, 2020 at 23:00
0

Answer for mark: @glen-jackman's answer was great and inspired me

my bash script looks like this:

cmds="git commit -m \"$(date)\""

echo $cmds
$cmds

when i ran the script, i got these:

git commit -m "Fri Jan 13 10:27:05 CST 2023"
error: pathspec 'Jan' did not match any file(s) known to git
error: pathspec '13' did not match any file(s) known to git
error: pathspec '10:27:05' did not match any file(s) known to git
error: pathspec 'CST' did not match any file(s) known to git
error: pathspec '2023"' did not match any file(s) known to git

set -x for debug the script, got these:

++ date
+ cmds='git commit -m "Fri Jan 13 10:34:30 CST 2023"'
+ echo git commit -m '"Fri' Jan 13 10:34:30 CST '2023"'
git commit -m "Fri Jan 13 10:34:30 CST 2023"
+ git commit -m '"Fri' Jan 13 10:34:30 CST '2023"'
error: pathspec 'Jan' did not match any file(s) known to git
error: pathspec '13' did not match any file(s) known to git
error: pathspec '10:34:30' did not match any file(s) known to git
error: pathspec 'CST' did not match any file(s) known to git
error: pathspec '2023"' did not match any file(s) known to git

the actual command to run is git commit -m '"Fri' Jan 13 10:34:30 CST '2023"', but what i expect is git commit -m "Fri Jan 13 10:34:30 CST 2023".

it is Word Splitting in action

My solution is use eval, modified script like this:

cmds="git commit -m \"$(date)\""

echo $cmds
eval $cmds

it worked, output like this:

++ date
+ cmds='git commit -m "Fri Jan 13 10:43:56 CST 2023"'
+ echo git commit -m '"Fri' Jan 13 10:43:56 CST '2023"'
git commit -m "Fri Jan 13 10:43:56 CST 2023"
+ eval git commit -m '"Fri' Jan 13 10:43:56 CST '2023"'
++ git commit -m 'Fri Jan 13 10:43:56 CST 2023'
[main b76daa9] Fri Jan 13 10:43:56 CST 2023
 2 files changed, 21 insertions(+), 22 deletions(-)

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .