23

Say if I wrote a program with the following line:

int main(int argc, char** argv)

Now it knows what command line arguments are passed to it by checking the content of argv.

Can the program detect how many spaces between arguments? Like when I type these in bash:

ibug@linux:~ $ ./myprog aaa bbb
ibug@linux:~ $ ./myprog       aaa      bbb

Environment is a modern Linux (like Ubuntu 16.04), but I suppose the answer should apply to any POSIX-compliant systems.

8
  • 22
    Just for curiosity, why would your program need to know that?
    – nxnev
    Commented Mar 15, 2018 at 6:18
  • 2
    @nxnev I used to write some Windows programs and I know it's possible there, so I wonder if there's something similar in Linux (or Unix).
    – iBug
    Commented Mar 15, 2018 at 6:18
  • 9
    I vaguely remember in CP/M that programs had to parse their own command lines - this meant that every C runtime had to implement a shell parser. And they all did it slightly differently. Commented Mar 15, 2018 at 9:41
  • 3
    @iBug There is, but you need to quote the arguments when invoking the command. That’s how it’s done on POSIX (and similar) shells. Commented Mar 15, 2018 at 12:30
  • 3
    @iBug, ...Windows has the same design that Toby mentions from CP/M above. UNIX doesn't do that -- from the called process's perspective, there is no command line involved in running it. Commented Mar 16, 2018 at 14:14

6 Answers 6

58

In general, no. Command line parsing is done by the shell which does not make the unparsed line available to the called program. In fact, your program might be executed from another program which created the argv not by parsing a string but by constructing an array of arguments programmatically.

9
  • 9
    You may want to mention execve(2).
    – iBug
    Commented Mar 15, 2018 at 6:17
  • 3
    You're right, as a lame excuse I can say that I'm currently using a phone and looking up man pages is a bit tedious :-) Commented Mar 15, 2018 at 6:22
  • 1
    This is the relevant section of POSIX. Commented Mar 15, 2018 at 6:50
  • 1
    @Hans-MartinMosner: Termux...? ;-)
    – DevSolar
    Commented Mar 15, 2018 at 14:58
  • 9
    "in general" was meant as a safeguard against citing a special convoluted case where it is possible - for example, a suid root process might be able to inspect the calling shell's memory and find the unparsed command line string. Commented Mar 16, 2018 at 8:13
39

It's not meaningful to talk of "spaces between arguments"; that's a shell concept.

A shell's job is to take whole lines of input and form them into arrays of arguments to start commands with. This may involve parsing quoted strings, expanding variables, file wildcards and tilde expressions, and more. The command is started with a standard exec system call, which accepts a vector of strings.

Other ways exist to create a vector of strings. Many programs fork and exec their own sub-processes with predetermined command invocations - in which case, there's never such a thing as a "command line". Similarly, a graphical (desktop) shell might start a process when a user drags a file icon and drops it on a command widget - again, there's no textual line to have characters "between" arguments.

As far as the invoked command is concerned, what goes on in a shell or other parent/precursor process is private and hidden - we only see the array of strings that standard C specifies that main() can accept.

2
  • 1
    Good answer - it's important to point this out for Unix newbies, who often assume that, if they run tar cf texts.tar *.txt then the tar program gets two arguments and has to expand the second one (*.txt) itself. Many people don't realize how it really works until they start writing their own scripts/programs that handle arguments. Commented Mar 16, 2018 at 7:24
  • There are some OSes (hello, CP/M!) that do require each program to do its own globbing; even though there are libraries for that, you end up with a system full of programs linked with different libraries (or just different versions) and it becomes really hard to predict what each command will do. I hope I never end up in that world again... Commented Nov 12, 2023 at 11:46
16

No, this is not possible, unless the spaces are part of an argument.

The command accesses the individual arguments from an array (in one form or another depending on programming language) and the actual command line may be saved to a history file (if typed at an interactive prompt in a shell that has history files), but is never passed on to the command in any form.

All commands on Unix are in the end executed by one of the exec() family of functions. These take the command name and a list or array of arguments. None of them takes a command line as typed at the shell prompt. The system() function does, but its string argument is later executed by execve(), which, again, takes an array of arguments rather than a command line string.

9
  • 2
    @LightnessRacesinOrbit I put that in there just in case there was some confusion about "spaces between arguments". Putting spaces in quotes between hello and world is literally spaces between the two arguments.
    – Kusalananda
    Commented Mar 15, 2018 at 12:39
  • 5
    @Kusalananda - Well, no... Putting spaces in quotes between hello and world is literally supplying the second of three arguments.
    – Jeremy
    Commented Mar 15, 2018 at 13:12
  • @Jeremy As I said, in case there was any confusion about what was meant by "between the arguments". Yes, as a second argument between the other two if you will.
    – Kusalananda
    Commented Mar 15, 2018 at 13:13
  • Your examples were fine, and instructive.
    – Jeremy
    Commented Mar 15, 2018 at 13:22
  • 1
    Well, guys, the examples were an obvious source of confusion and misunderstanding. I have deleted them as the did not add to the value of answer.
    – Kusalananda
    Commented Mar 15, 2018 at 13:23
9

In general, it is not possible, like several other answers explained.

However, Unix shells are ordinary programs (and they are interpreting the command line and globbing it, i.e. expanding the command before doing fork & execve for it). See this explanation about bash shell operations. You could write your own shell (or you could patch some existing free software shell, e.g. GNU bash) and use it as your shell (or even your login shell, see passwd(5) & shells(5)).

For example, you might have your own shell program put the full command line in some environment variable (imagine MY_COMMAND_LINE for example) -or use any other kind of inter-process communication to transmit the command line from shell to child process-.

I don't understand why you would want to do that, but you might code a shell behaving in such a way (but I recommend not doing so).

BTW, a program could be started by some program which is not a shell (but which do fork(2) then execve(2), or just an execve to start a program in its current process). In that case there is no command line at all, and your program could be started without a command...

Notice that you might have some (specialized) Linux system without any shell installed. This is weird and unusual, but possible. You'll then need to write a specialized init program starting other programs as needed - without using any shell but by doing fork & execve system calls.

Read also Operating Systems : Three easy pieces and don't forget that execve is practically always a system call (on Linux, they are listed in syscalls(2), see also intro(2)) which reinitialize the virtual address space (and some other things) of the process doing it.

5
  • This is the best answer. I assume (without having looked that up) that argv[0] for the program name and the remaining elements for the arguments are POSIX specifications and cannot be changed. A runtime environment could specify argv[-1] for the command line, I assume ... Commented Mar 17, 2018 at 13:29
  • No, it could not. Read more carefully execve documentation. You cannot use argv[-1], it is undefined behavior to use it. Commented Mar 17, 2018 at 13:31
  • Yeah, good point (also the hint that we have a syscall) -- the idea is a bit contrived. All three components of the runtime (shell, stdlib and OS) would need to collaborate. The shell needs to call a special non-POSIX execvepluscmd function with an extra parameter (or argv convention), the syscall constructs an argument vector for main that contains a pointer to the command line before the pointer to the program name, and then pass the address of the pointer to the program name as argv when calling the program's main... Commented Mar 17, 2018 at 14:11
  • No need to re-write the shell, just use the quotes. This feature was available from the bourn shell sh. So is not new. Commented Mar 18, 2018 at 11:12
  • Using quotes require to change the command line. And OP don't want that Commented Mar 18, 2018 at 16:14
3

You can always tell your shell to tell applications what shell code lead to their execution. For instance, with zsh, by passing that information in the $SHELL_CODE environment variable using the preexec() hook (printenv used as an example, you'd use getenv("SHELL_CODE") in your program):

$ preexec() export SHELL_CODE=$1
$ printenv SHELL_CODE
printenv SHELL_CODE
$ printenv  SHELL_CODE
printenv  CODE
$ $(echo printenv SHELL_CODE)
$(echo printenv SHELL_CODE)
$ for i in SHELL_CODE; do printenv "$i"; done
for i in SHELL_CODE; do printenv "$i"; done
$ printenv SHELL_CODE; : other command
printenv SHELL_CODE; : other command
$ f() printenv SHELL_CODE
$ f
f

All those would execute printenv as:

execve("/usr/bin/printenv", ["printenv", "SHELL_CODE"], 
       ["PATH=...", ..., "SHELL_CODE=..."]);

Allowing printenv to retrieve the zsh code that lead to the execution of printenv with those arguments. What you would want to do with that information is not clear to me.

With bash, the feature closest to zsh's preexec() would be using its $BASH_COMMAND in a DEBUG trap, but note that bash does some level of rewriting in that (and in particular refactors some of the whitespace used as delimiter) and that's applied to every (well, some) command run, not the whole command line as entered at the prompt (see also the functrace option).

$ trap 'export SHELL_CODE="$BASH_COMMAND"' DEBUG
$ printenv SHELL_CODE
printenv SHELL_CODE
$ printenv $(echo 'SHELL_CODE')
printenv $(echo 'SHELL_CODE')
$ for i in SHELL_CODE; do printenv "$i"; done; : other command
printenv "$i"
$ printf '%s\n' "$(printenv "SHELL_CODE")"
printf '%s\n' "$(printenv "SHELL_CODE")"
$ set -o functrace
$ printf '%s\n' "$(printenv "SHELL_CODE")"
printenv "SHELL_CODE"
$ print${-+env  }    $(echo     'SHELL_CODE')
print${-+env  } $(echo     'SHELL_CODE')

See how some of the spaces that are delimiters in the shell language syntax have been squeezed into 1 and how not the full command line is not always passed to the command. So probably not useful in your case.

Note that I would not advise doing this kind of thing, as you're potentially leaking sensitive information to every command as in:

echo very_secret | wc -c | untrustedcmd

would leak that secret to both wc and untrustedcmd.

Of course, you could do that kind of thing for other languages than the shell. For instance, in C, you could use some macros that exports the C code that executes a command to the environment:

#include <unistd.h>
#include <stdlib.h>
#include <sys/wait.h>
#define WRAP(x) (setenv("C_CODE", #x, 1), x)

int main(int argc, char *argv[])
{
  if (!fork()) WRAP(execlp("printenv", "printenv", "C_CODE", NULL));
  wait(NULL);
  if (!fork()) WRAP(0 + execlp("printenv",   "printenv", "C_CODE", NULL));
  wait(NULL);
  if (argc > 1 && !fork()) WRAP(execvp(argv[1], &argv[1]));
  wait(NULL);
  return 0;
}

Example:

$ ./a.out printenv C_CODE
execlp("printenv", "printenv", "C_CODE", NULL)
0 + execlp("printenv", "printenv", "C_CODE", NULL)
execvp(argv[1], &argv[1])

See how some spaces were condensed by the C pre-processor like in the bash case. In most if not all languages, the amount of space used in delimiters makes no difference, so it's not surprising that the compiler/interpreter takes some liberty with them here.

2
  • When I was testing this, BASH_COMMAND didn't contain the original whitespace separating arguments, so this wasn't usable for the OP's literal request. Does this answer include any demonstration either way for that particular use case? Commented Mar 17, 2018 at 21:25
  • @CharlesDuffy, I just wanted to indicate the closest equivalent of zsh's preexec() in bash (as that's the shell the OP was refering to) and point out that it could not be used for that specific use case, but I agree it was not very clear. See edit. This answer is intended to be more generic about how to pass the source code (here in zsh/bash/C) that caused the execution to the command being executed (not something that is useful, but I hope that while doing so, and especially with the examples, I demonstrates that it's not very useful) Commented Mar 18, 2018 at 8:33
0

I will just add what is missing in the other answers.

No

See other answers

Maybe, sort of

There is nothing that can be done in the program, but there is something that can be done in the shell when you run the program.

You need to use quotes. So instead of

./myprog      aaa      bbb

you need to do one of these

./myprog "     aaa      bbb"
./myprog '     aaa      bbb'

This will pass a single argument to the program, with all of the spaces. There is a difference between the two, the second is literal, exactly the string as it appears (except that ' must be typed as \'). The first one will interpret some characters, but split into several arguments. See shell quoting for more information. So no need to rewrite the shell, the shell designers have already thought of that. However because it is now one argument, you will have to do more passing within the program.

Option 2

Pass the data in via stdin. This is the normal way to get large amounts of data into a command. e.g.

./myprog << EOF
    aaa      bbb
EOF

or

./myprog
Tell me what you want to tell me:
aaaa bbb
ctrl-d

(Italics are output of program)

4
  • Technically, the shell code: ./myprog␣"␣␣␣␣␣aaa␣␣␣␣␣␣bbb" executes (generally in a child process) the file stored in ./myprog and passes it two arguments: ./myprog and ␣␣␣␣␣aaa␣␣␣␣␣␣bbb (argv[0] and argc[1], argc being 2) and as in the OP's, the space that separates those two arguments is not passed in any way to myprog. Commented Mar 18, 2018 at 12:53
  • But you are changing the command, and OP don't want to change it Commented Mar 18, 2018 at 18:18
  • @BasileStarynkevitch Following your comment, I read the question again. You are making an assumption. Nowhere does the OP say that they don't want to change the way the program is run. Maybe this is true, but they had nothing to say on it. Therefore this answer may be what they need. Commented Mar 18, 2018 at 22:07
  • OP ask explicitly about spaces between arguments, not about one single argument containing spaces Commented Mar 19, 2018 at 5:16

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .