How do ssh remote command line arguments get parsed

Question

I've seen the questions and answers about needing to double-escape the arguments to remote ssh commands. My question is: Exactly where and when does the second parsing get done?

If I run the following:

$ ssh otherhost pstree -a -p

I see the following in the output:

  |-sshd,3736
  |   `-sshd,1102
  |       `-sshd,1109
  |           `-pstree,1112 -a -p

The parent process for the remote command (pstree) is sshd, there doesn't appear to be any shell there that would be parsing the command line arguments to the remote command, so it doesn't seem as if double quoting or escaping would be necessary (but it definitely is). If instead I ssh there first and get a login shell, and then run pstree -a -p I see the following in the output:

  ├─sshd,3736
  │   └─sshd,3733
  │       └─sshd,3735
  │           └─bash,3737
  │               └─pstree,4130 -a -p

So clearly there's a bash shell there that would do command line parsing in that case. But the case where I use a remote command directly, there doesn't seem to be a shell, so why is double quoting necessary?

Gilles 'SO- stop being evil' · Accepted Answer · 2020-12-19 19:55:05Z

There is always a remote shell. In the SSH protocol, the client sends the server a string to execute. The SSH command line client takes its command line arguments and concatenates them with a space between the arguments. The server takes that string, runs the user's login shell and passes it that string. (More precisely: the server runs the program that is registered as the user's shell in the user database, passing it two command line arguments: -c and the string sent by the client. The shell is not invoked as a login shell: the server does not set the zeroth argument to a string beginning with -.)

It is impossible to bypass the remote shell. The protocol doesn't have anything like sending an array of strings that could be parsed as an argv array on the server. And the SSH server will not bypass the remote shell because that could be a security restriction: using a restricted program as the user's shell is a way to provide a restricted account that is only allowed to run certain commands (e.g. an rsync-only account or a git-only account).

You may not see the shell in pstree because it may be already gone. Many shells have an optimization where if they detect that they are about to do “run this external command, wait for it to complete, and exit with the command's status”, then the shell runs “execve of this external command” instead. This is what's happening in your first example. Contrast the following three commands:

ssh otherhost pstree -a -p
ssh otherhost 'pstree -a -p'
ssh otherhost 'pstree -a -p; true'

The first two are identical: the client sends exactly the same data to the server. The third one sends a shell command which defeats the shell's exec optimization.

ha! can't believe you beat me to answering my own question. I figured it out halfway through posting the question and figured I should just go through with asking and answering it myself. — onlynone, Commented Jan 3, 2018 at 21:22
"The server takes that string, runs the user's login shell and passes it that string." On my machine, man ssh seems to contradict this. It says, "If a command is specified, it is executed on the remote host instead of a login shell." — Maxpm, Commented Dec 17, 2020 at 19:33
@Maxpm “Login shell” can mean two different things: in the man page, it's a way of invoking a shell with the zeroth argument beginning with -, which tells shells to do their login-shell behavior such as reading .profile. In my answer, where I've added a clarification, I mean the shell that's registered as the user's shell in the user database (/etc/passwd or equivalent). — Gilles 'SO- stop being evil', Commented Dec 19, 2020 at 19:56
@Gilles'SO-stopbeingevil' Aha! Thanks, that clarification helps. — Maxpm, Commented Dec 19, 2020 at 22:54

onlynone · Accepted Answer · 2018-01-03 21:17:22Z

I think I figured it out:

$ ssh otherhost pstree -a -p -s '$$'
init,1         
  `-sshd,3736
      `-sshd,11998
          `-sshd,12000
              `-pstree,12001 -a -p -s 12001

The arguments to pstree are to: show command line arguments, show pids, and show just parent processes of the given pid. The '$$' is a special shell variable that bash will replace with its own pid when bash evaluates the command line arguments. It's quoted once to stop it from being interpreted by my local shell. But it's not doubly quoted or escaped to allow it to be interpreted by the remote shell.

As we can see, it is replaced with 12001 so that's the pid of the shell. We can also see from the output: pstree,12001 that the process with a pid of 12001 is pstree itself. So pstree is the shell?

What I gather is going on there is that bash is being invoked and it is parsing the command line arguments, but then it invokes exec to replace itself with the command being run.

It seems that it only does this in the case of a single remote command:

$ ssh otherhost pstree -a -p -s '$$' \; echo hi
init,1         
  `-sshd,3736
      `-sshd,17687
          `-sshd,17690
              `-bash,17691 -c pstree -a -p -s $$ ; echo hi
                  `-pstree,17692 -a -p -s 17691
hi

In this case, I'm requesting two commands be run: pstree followed by echo. And we can see here that bash does in fact show up in the process tree as a parent of pstree.

Yup ! + 1. It exemplifies what Gilles put more formally first and exemplified second. Perhaps giving him credit for his early answer is in order ? — Cbhihe, Commented Jan 4, 2018 at 12:14

Community · Accepted Answer · 2021-10-07 07:34:52Z

Supporting what the other answers have said, I looked up the code that invokes commands on the remote, https://github.com/openssh/openssh-portable/blob/4f29309c4cb19bcb1774931db84cacc414f17d29/session.c#L1660...

1660    /*
1661     * Execute the command using the user's shell.  This uses the -c
1662     * option to execute the command.
1663     */
1664    argv[0] = (char *) shell0;
1665    argv[1] = "-c";
1666    argv[2] = (char *) command;
1667    argv[3] = NULL;
1668    execve(shell, argv, env);
1669    perror(shell);
1670    exit(1);

... which, as you can see, unconditionally invokes shell with first argument -c and second argument command. Earlier, the shell variable was set to the user's login shell as recorded in /etc/passwd. command is an argument to this function, and ultimately is set to a string read verbatim off the wire (see session_exec_req in the same file). So, the server does not interpret the command at all, but a shell is always invoked on the remote.

However, the relevant part of the SSH protocol specification does not appear to require this behavior; it only says

 byte      SSH_MSG_CHANNEL_REQUEST
 uint32    recipient channel
 string    "exec"
 boolean   want reply
 string    command
This message will request that the server start the execution of the given command. The 'command' string may contain a path. Normal precautions MUST be taken to prevent the execution of unauthorized commands.

This is probably because not all operating systems have the concept of a command-line shell. For instance, it wouldn't have been crazy for a Classic MacOS ssh server to feed "exec" command strings to the AppleScript interpreter instead.

Stack Exchange Network

How do ssh remote command line arguments get parsed

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
bash
shell
ssh
process
quoting
.

Linked

Hot Network Questions

How do ssh remote command line arguments get parsed

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged bashshellsshprocessquoting.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
bash
shell
ssh
process
quoting
.