119

Typical Unix/Linux programs accept the command line inputs as an argument count (int argc) and an argument vector (char *argv[]). The first element of argv is the program name - followed by the actual arguments.

Why is the program name passed to the executable as an argument? Are there any examples of programs using their own name (maybe some kind of exec situation)?

19
  • 6
    like mv and cp ?
    – Archemar
    Commented Oct 12, 2016 at 5:26
  • 9
    On Debian sh is symlink to dash. They behave different, when called as sh or as dash
    – Motte001
    Commented Oct 12, 2016 at 15:48
  • 21
    @AlexejMagura If you use something like busybox (common on rescue-discs and such), then pretty much everything (cp, mv, rm, ls, ...) is a symbolic link to busybox. Commented Oct 12, 2016 at 16:20
  • 11
    I'm finding this really hard to ignore, so I'll say it: you probably mean "GNU" programs (gcc, bash, gunzip, most of the rest of the OS...), as Linux is just the kernel.
    – wizzwizz4
    Commented Oct 12, 2016 at 17:50
  • 11
    @wizzwizz4 What's wrong with "Typical Unix/Linux programs"? I read it like "Typical programs running on Unix/Linux". That's much better than your restriction to certain GNU programs. Dennis Ritchie was certainly not using any GNU programs. BTW the Hurd kernel is an example of a GNU program which does not have a main function...
    – rudimeier
    Commented Oct 13, 2016 at 11:49

11 Answers 11

133

To begin with, note that argv[0] is not necessarily the program name. It is what the caller puts into argv[0] of the execve system call (e.g. see this question on Stack Overflow). (All other variants of exec are not system calls but interfaces to execve.)

Suppose, for instance, the following (using execl):

execl("/var/tmp/mybackdoor", "top", NULL);

/var/tmp/mybackdoor is what is executed but argv[0] is set to top, and this is what ps or (the real) top would display. See this answer on U&L SE for more on this.

Setting all of this aside: Before the advent of fancy filesystems like /proc, argv[0] was the only way for a process to learn about its own name. What would that be good for?

  • Several programs customize their behavior depending on the name by which they were called (usually by symbolic or hard links, for example BusyBox's utilities; several more examples are provided in other answers to this question).
  • Moreover, services, daemons and other programs that log through syslog often prepend their name to the log entries; without this, event tracking would become next to infeasible.
6
  • 18
    Examples of such programs are bunzip2, bzcat and bzip2, for which first two are symlinks to the third one.
    – Ruslan
    Commented Oct 12, 2016 at 7:37
  • 5
    @Ruslan Interestingly zcat is not a symlink. They seem to avoid the downsides of this technique using a shell script instead. But they fail to print a complete --help output because somebody who added options to gzip forgot to maintain zcat too.
    – rudimeier
    Commented Oct 12, 2016 at 9:17
  • 3
    For as long as I can remember, the GNU coding standards have discouraged the use of argv[0] to change program behavior (section "Standards for Interfaces Generally" in the current version). gunzip is a historical exception.
    – user41515
    Commented Oct 12, 2016 at 11:58
  • 19
    busybox is another excellent example. It can be called by 308 different names to invoke different commands: busybox.net/downloads/BusyBox.html#commands Commented Oct 12, 2016 at 13:24
  • 2
    Many, many more programs also inject their argv[0] in their usage/help output instead of hard-coding their name. Some in full, some just the basename.
    – spectras
    Commented Oct 15, 2016 at 14:13
67

Plenty:

  • Bash runs in POSIX mode when argv[0] is sh. It runs as a login shell when argv[0] begins with -.
  • Vim behaves differently when run as vi, view, evim, eview, ex, vimdiff, etc.
  • Busybox, as already mentioned.
  • In systems with systemd as init, shutdown, reboot, etc. are symlinks to systemctl.
  • and so on.
5
  • 7
    Another one is sendmail and mail. Every single unix MTA comes with a symlink for those two commands, and is designed to emulate the original's behaviour when called as such, meaning that any unix program that needs to send mail knows exactly how they can do so. Commented Oct 12, 2016 at 9:51
  • 4
    an other common case: test and [: when you call the former, it handles an error if the last argument is ]. (on actual Debian stable these commands are two different programs, but previous versions and MacOs still uses the same program). And tex, latex and so on: the binary is the same, but looking how it was called, it choose the proper configuration file. init is similar. Commented Oct 12, 2016 at 19:25
  • 4
    Related, [ considers it an error if the last argument is not ].
    – chepner
    Commented Oct 13, 2016 at 15:49
  • 1
    I guess this answers the second question, but not the first. I very much doubt some OS designer sat down and said »Hey, it would be cool if I had the same program doing different things just based on its executable name. I guess I'll include the name in its argument array, then.«
    – Joey
    Commented Oct 18, 2016 at 5:52
  • @Joey Yes, the wording is intended to convey that (Q: "Are there any ...?" A: "Plenty: ...")
    – muru
    Commented Oct 18, 2016 at 6:34
37

Historically, argv is just an array of pointers to the "words" of the commandline, so it makes sense to start with the first "word", which happens to be the name of the program.

And there's quite a few programs that behave differently according to which name is used to call them, so you can just create different links to them and get different "commands". The most extreme example I can think of is busybox, which acts like several dozen different "commands" depending on how it is called.

Edit: References for Unix 1st edition, as requested

One can see e.g. from the main function of cc that argc and argv were already used. The shell copies arguments to the parbuf inside the newarg part of the loop, while treating the command itself in the same way as the arguments. (Of course, later on it executes only the first argument, which is the name of the command). It looks like execv and relatives didn't exist then.

4
  • 1
    please add references that back this up.
    – Lesmana
    Commented Oct 12, 2016 at 10:24
  • From a quick skimming, exec takes the name of the command to execute and a zero-terminated array of char pointers (best seen at minnie.tuhs.org/cgi-bin/utree.pl?file=V1/u0.s, where exec takes references to label 2 and label 1, and at label 2: appears etc/init\0, and at label 1: appears a reference to label 2, and a terminating zero), which is basically what execve does today minus envp.
    – ninjalj
    Commented Oct 13, 2016 at 19:49
  • 1
    execv and execl have existed "forever" (i.e., since the early to mid 1970s) — execv was a system call and execl was a library function that called it.   execve didn't exist then because the environment didn't exist then.   The other members of the family were added later. Commented Oct 14, 2016 at 5:41
  • @G-Man Can you point me to execv in the v1 source I linked? Just curious.
    – dirkt
    Commented Oct 14, 2016 at 6:56
24

Use cases:

You can use the program name to change the program behavior.

For example you could create some symlinks to the actual binary.

One famous example where this technique is used is the busybox project which installs only one single binary and many symlinks to it. (ls, cp, mv, etc). They are doing it to save storage space because their targets are small embedded devices.

This is also used in setarch from util-linux:

$ ls -l /usr/bin/ | grep setarch
lrwxrwxrwx 1 root root           7 2015-11-05 02:15 i386 -> setarch
lrwxrwxrwx 1 root root           7 2015-11-05 02:15 linux32 -> setarch
lrwxrwxrwx 1 root root           7 2015-11-05 02:15 linux64 -> setarch
-rwxr-xr-x 1 root root       14680 2015-10-22 16:54 setarch
lrwxrwxrwx 1 root root           7 2015-11-05 02:15 x86_64 -> setarch

Here they are using this technique basically to avoid many duplicate source files or just to keep the sources more readable.

Another use case would be a program which needs to load some modules or data at runtime. Having the program path makes you able to load modules from a path relative to the program location.

Moreover many programs print error messages including the program name.

Why:

  1. Because it's POSIX convention (man 3p execve):

argv is an array of argument strings passed to the new program. By convention, the first of these strings should contain the filename associated with the file being executed.

  1. It's C standard (at least C99 and C11):

If the value of argc is greater than zero, the string pointed to by argv[0] represents the program name; argv[0][0] shall be the null character if the program name is not available from the host environment.

Note the C Standard says "program name" not "filename".

5
  • 3
    Doesn't this break if you reach the symlink from another symlink?
    – user541686
    Commented Oct 12, 2016 at 6:55
  • 3
    @Mehrdad, Yes that's the downside and can be confusing for the user.
    – rudimeier
    Commented Oct 12, 2016 at 7:30
  • @rudimeier: Your 'Why' items are not really reasons, they're just a "homunculus", i.e. it just begs the question of why do the standard require this to be the case.
    – einpoklum
    Commented Oct 15, 2016 at 11:10
  • @einpoklum OP's question was: Why is the program name passed to the executable? I answered: Because POSIX and C standard tells us to do so. How you think that's not really a reason? If the docs I've quoted would not exist then probably many programs would not pass the program name.
    – rudimeier
    Commented Oct 15, 2016 at 23:35
  • 1
    The OP is effectively asking "WHY do the POSIX and C standards say to do this?" Granted the wording was at an abstracted level, but it seems clear. Realistically, the only way to know is to ask the originators. Commented Oct 17, 2016 at 0:30
23

In addition to programs altering their behaviour depending on how they were called, I find argv[0] useful in printing the usage of a program, like so:

printf("Usage: %s [arguments]\n", argv[0]);

This causes the usage message to always use the name through which it was called. If the program is renamed, its usage message changes with it. It even includes the path name it was called with:

# cat foo.c 
#include <stdio.h>
int main(int argc, char **argv) { printf("Usage: %s [arguments]\n", argv[0]); }
# gcc -Wall -o foo foo.c
# mv foo /usr/bin 
# cd /usr/bin 
# ln -s foo bar
# foo
Usage: foo [arguments]
# bar
Usage: bar [arguments]
# ./foo
Usage: ./foo [arguments]
# /usr/bin/foo
Usage: /usr/bin/foo [arguments]

It's a nice touch, especially for small special-purpose tools/scripts that might live all over the place.

This seems common practice in GNU tools as well, see ls for example:

% ls --qq
ls: unrecognized option '--qq'
Try 'ls --help' for more information.
% /bin/ls --qq
/bin/ls: unrecognized option '--qq'
Try '/bin/ls --help' for more information.
1
  • 3
    +1. I was going to suggest the same. Strange that so many people focus on changing behaviour and fail to mention probably the most obvious and much more widespread usage.
    – The Vee
    Commented Oct 13, 2016 at 11:51
6

One executes the program typing: program_name0 arg1 arg2 arg3 ....

So the shell should already divide the token, and the first token is already the program name. And BTW so there are the same indices on program side and on shell.

I think this was just a convenience trick (on very very beginning), and, as you see in other answers, it was also very handy, so this tradition was continued and set as API.

5

Basically, argv includes the program name so that you can write error messages like prgm: file: No such file or directory, which would be implemented with something like this:

    fprintf( stderr, "%s: %s: No such file or directory\n", argv[0], argv[1] );
3

Another example of an application of this is this program, which replaces itself with... itself, until you type something that isn't y.

#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

int main (int argc, char** argv) {

  (void) argc;

  printf("arg: %s\n", argv[1]);
  int count = atoi(argv[1]);

  if ( getchar() == 'y' ) {

    ++count;

    char buf[20];
    sprintf(buf, "%d", count);

    char* newargv[3];
    newargv[0] = argv[0];
    newargv[1] = buf;
    newargv[2] = NULL;

    execve(argv[0], newargv, NULL);
  }

  return count;
}

Obviously, kind of a contrived if interesting example, but I think this may have real uses -- for example, a self-updating binary, which rewrites its own memory space with a new version of itself that it downloaded or changed.

Example:

$ ./res 1
arg: 1
y
arg: 2
y
arg: 3
y
arg: 4
y
arg: 5
y
arg: 6
y
arg: 7
n

7 | $

Source, and some more info.

1
1

The path to the program is argv[0], so that the program can retrieve configuration files etc. from its install directory.
This would be impossible without argv[0].

2
  • 2
    That's not a particularly good explanation - there's no reason we couldn't have standardised on something like (char *path_to_program, char **argv, int argc) for example
    – moopet
    Commented Oct 17, 2016 at 10:46
  • Afaik, most programs pull configuration from a standard location (~/.<program>, /etc/<program, $XDG_CONFIG_HOME) and either take a parameter to change it or have a compile-time option that bakes in a constant to the binary. Commented Oct 17, 2016 at 18:32
1

ccache behaves this way in order to imitate different calls to compiler binaries. ccache is a compilation cache - the whole point is never to compile the same source code twice but instead return the object code from cache if possible.

From the ccache man page, "there are two ways to use ccache. You can either prefix your compilation commands with ccache or you can let ccache masquerade as the compiler by creating a symbolic link (named as the compiler) to ccache. The first method is most convenient if you just want to try out ccache or wish to use it for some specific projects. The second method is most useful for when you wish to use ccache for all your compilations."

The symlinks method involves running these commands:

cp ccache /usr/local/bin/
ln -s ccache /usr/local/bin/gcc
ln -s ccache /usr/local/bin/g++
ln -s ccache /usr/local/bin/cc
ln -s ccache /usr/local/bin/c++
... etc ...

... the effect of which is to allow ccache to snag any commands which would otherwise have gone to the compilers, thus allowing ccache to return a cached file or pass the command on to the actual compiler.

0

The following is a practical example to illustrate the usefulness of having the program name as argv[0] ($0 in bash):

To accomplish a certain task, I often need to read the manual pages. But this is often tiresome, and a lot of time is wasted when reading the man pages over and over again any time I need to use the same command or need to do the same thing in the future.

So I started making small, textual notes about each command I use the most, and use it in an outliner (emacs' outline-mode):

tcpdump help file

Using an outliner helps to collapse or show large portions of text, making navigation easier, and accessing desired information quicker. When used properly, it makes reading and maintaining documentation much more efficient.

To get help on a specific command, for example strace, instead of doing man strace, I did strace.help

In the beginning, I wrote aliases that looked like this:

alias strace.help='emacs ~/help/strace'
alias tcpdump.help='emacs ~/help/tcpdump'
alias ps.help='emacs ~/help/ps'
...

help aliases

but then I thought wait, this is really stupid. why should I have x aliases that all look the same? isn't this crying for refactoring? So I rewrote the aliases into a single aliase creating bash function, using eval inside a for loop, so the aliases were created dynamically, but that still seemed wrong...

Then I remembered the argv[0] trick that I saw while exploring the source code of some the most common unix/linux commands: write a single program that behaves differently depending on how it's called.

So I wrote a generic command.help script that looked like this:

#!/bin/bash
commandname=$(filename.path.basename "$0")
# make sure to stip the .help suffix
commandname=$(filename.ext.remove "$commandname")
if [[ -f ~/.bash_lib/help/$commandname ]]
then
    $EDITOR ~/.bash_lib/help/$commandname
else
    echo "no custom help available for $commandname"
fi

filename.path.basename and filename.ext.remove are short bash utils and showing their code has no relevance here.

command.help source code

I put that script in my $PATH, then I created multiple links command.help, with different names:

strace.help,
tcpdump.help,
ps.help
...

creating links

so now, anytime I add a new help note for a command, say zip, I only need to create a new link to command.help under the name zip.help, and it will magically open the help file for the zip command.

creating links

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .