Parsing command-line arguments with getopt in C++

Question

I did not find anything to parse command-line arguments in C++ (which was surprising), so I fell back to getopt() which has served me well in C (I am currently not interested in using some new third-party library).

Following is some code I have put together (it is an unfinished implementation of the UNIX wc program, but that's not the point of this review):

#include <iostream>
#include <cstdlib>

/* And why in the world are these not prefixed with 'c'? */
#include <unistd.h>
#include <getopt.h>

static constexpr const option long_options[] {
    { "bytes", no_argument, nullptr, 'c' },
    { "lines", no_argument, nullptr, 'l' },
    { "max-line-length", no_argument, nullptr, 'f' },
    { "words", no_argument, nullptr, 'w' },
    { "help", no_argument, nullptr, 'h' },
    { nullptr, 0, nullptr, 0 }
};

static const char* argv0 {"<unknown>"};

static bool cflag {};   /* byte count flag. */
static bool lflag {};   /* line count flag. */
static bool fflag {};   /* max line length flag. */
static bool wflag {};   /* word count flag. */

static void help()
{
    std::cout << "\nUSAGE\n\twc [OPTION]... [FILE]...\n\n"
         "NAME\n\twc - word, line, and byte count.\n\n"
         "DESCRIPTION\n\tPrint newline, word, and byte counts for each FILE, and a total line if more\n"
         "\tthan one FILE is specified. A word is a non-zero-length sequence of printable\n"
         "\tcharacters delimited by white space. When no FILE, or when FILE is -, read\n"
         "\tstandard input.\n"
         "\tIf FILE is a directory, wc behaves the same as if all the directory's files\n"
         "\twere listed separately.\n\n"
         "OPTION:\n"
         "\t-c, --bytes                 print the byte counts.\n"
         "\t-l, --lines                 print the newline counts.\n"
         "\t-f, --max-line-length       print the maximum display width.\n"
         "\t-w, --words                 print the word counts.\n"
         "\t-h, --help                  display this help and exit.\n\n";
    std::exit(EXIT_SUCCESS);
}

static void err_msg()
{
    std::cerr << "The syntax of the command is incorrect.\n"
                 "Try " << argv0 << " -h for more information.\n";
    std::exit(EXIT_FAILURE);
}

static void parse_options(int argc, char *argv[])
{
    int c {};

    while ((c = getopt_long(argc, argv, "clfwh", long_options, nullptr)) != -1) {
        switch (c) {
            case 'c':
                cflag = true;
                break;

            case 'l':
                lflag = true;
                break;

            case 'f':
                fflag = true;
                break;

            case 'w':
                wflag = true;
                break;

            case 'h':
                help();
                break;

            /* case '?' */
            default:
                err_msg();
                break;
        }
    }
}

int main(int argc, char *argv[])
{
    parse_options(argc, argv);
    
    /* This test is superflous. */
    if (argv[0]) {
        argv0 = argv[0];
    }

    /* Read from stdin in case optind == argc. */
    for (int i {optind}; i < argc; ++i) {
        std::cout << argv[i] << "\n";
    }
}

I will be using this to to implement wc, and probably more shell utilities in the near future.

Review Request:

Bad/outdated/deprecated practices, any simplifications, style, et cetera.

Are the calls to std::exit() justifiable?

indi · Accepted Answer · 2024-05-08 20:31:47Z

Are the calls to std::exit() justifiable?

No. Never use std::exit().

/* And why in the world are these not prefixed with 'c'? */
#include <unistd.h>
#include <getopt.h>

Why would they be?

(Most) standard C library headers are available in C++ by dropping the .h and prefixing with c. But those are not standard C library headers; they are part of the POSIX library. There is no C++ equivalent of the POSIX library.

static constexpr const option long_options[] {
    { "bytes", no_argument, nullptr, 'c' },
    { "lines", no_argument, nullptr, 'l' },
    { "max-line-length", no_argument, nullptr, 'f' },
    { "words", no_argument, nullptr, 'w' },
    { "help", no_argument, nullptr, 'h' },
    { nullptr, 0, nullptr, 0 }
};

static const char* argv0 {"<unknown>"};

static bool cflag {};   /* byte count flag. */
static bool lflag {};   /* line count flag. */
static bool fflag {};   /* max line length flag. */
static bool wflag {};   /* word count flag. */

Why is all this stuff global?

The options description array only needs to be in parse_options. There is no benefit to making it global.

argv0 appears intended to store whatever was in argv[0], and retain that even if argv[0] is changed, so you can always get the program name. But I would suggest that thinking is wrong-headed.

Let’s just look at the (currently) one place argv0 is used, for example: err_msg(). Basically, the code structure is this:

char const* argv0 = "some default value";

auto err_msg()
{
    // use argv0
}

auto parse_options(int argc, char* argv[])
{
    err_msg();
}

auto main(int argc, char* argv[]) -> int
{
    parse_options(argc, argv);
    
    // set argv
}

Now, set aside for now the silliness of setting argv0 after the call in which it is used. Instead, focus on the question of why a global variable is needed for this.

Couldn’t you do:

auto err_msg(std::string_view argv0)
{
    // use argv0
}

auto parse_options(int argc, char* argv[])
{
    err_msg(argv[0]);
}

auto main(int argc, char* argv[]) -> int
{
    parse_options(argc, argv);
    
    // set argv
}

No more need for the global.

As for all those global flags, I presume the intention is for the rest of the program to use them to know which flags have been set. Except, again, global variables are both a bad idea, and unnecessary. Just return an options object:

struct options
{
    bool cflag = false;
    bool lflag = false;
    bool fflag = false;
    bool wflag = false;
};

// This, or use a proper type that can report errors better:
enum class parse_options_error
{
    help_requested,
    unknown_option
};

auto parse_options(int argc, char* argv[]) -> std::expected<options, parse_options_error>
{
    auto opts = options{};

    while (true)
    {
        static constexpr ::option long_options[] =
        {
            { "bytes", no_argument, nullptr, 'c' },
            { "lines", no_argument, nullptr, 'l' },
            { "max-line-length", no_argument, nullptr, 'f' },
            { "words", no_argument, nullptr, 'w' },
            { "help", no_argument, nullptr, 'h' },
            { nullptr, 0, nullptr, 0 }
        };

        auto const c = ::getopt_long(argc, argv, "clfwh", long_options, nullptr);
        if (c == -1)
            break;

        switch (c)
        {
        case 'c':
            opts.cflag = true;
            break;
        case 'l':
            opts.lflag = true;
            break;
        case 'f':
            opts.fflag = true;
            break;
        case 'w':
            opts.wflag = true;
            break;
        case 'h':
            return std::unexpected{parse_options_error::help_requested};
        /* case '?' */
        default:
            return std::unexpected{parse_options_error::unknown_option};
        }
    }

    return opts;
}

auto main(int argc, char* argv[]) -> int
{
    auto const opts = parse_options(argc, argv);
    if (not opts)
    {
        if (opts.error() == parse_options_error::help_requested)
        {
            help(std::cout);
            return 0;
        }

        if (opts.error() == parse_options_error::unknown_option)
        {
            err_msg(std::cerr, argv[0]);
            return EXIT_FAILURE;
        }
    }

    // Options were parsed successfully...
}

What is the benefit of doing it this way, with no globals? You can test it. YOU SHOULD ALWAYS TEST YOUR CODE.

Right now, your code as is with all the globals is functionally untestable. Because it can’t be tested, it can’t be trusted. Maybe it works; maybe it doesn’t. This is faith-based programming: bang code out then pray it works. No real-world project worth taking seriously would accept that.

Untested code is garbage code. That does not necessarily mean the code is bad; it may be the most brilliant, most perfect piece of code ever written. But without testing, it cannot be trusted, and it cannot be used anywhere where we need to trust that the code actually works. Thus, it is useless; it is garbage.

static void help()
{
    std::cout << "\nUSAGE\n\twc [OPTION]... [FILE]...\n\n"
         "NAME\n\twc - word, line, and byte count.\n\n"
         "DESCRIPTION\n\tPrint newline, word, and byte counts for each FILE, and a total line if more\n"
         "\tthan one FILE is specified. A word is a non-zero-length sequence of printable\n"
         "\tcharacters delimited by white space. When no FILE, or when FILE is -, read\n"
         "\tstandard input.\n"
         "\tIf FILE is a directory, wc behaves the same as if all the directory's files\n"
         "\twere listed separately.\n\n"
         "OPTION:\n"
         "\t-c, --bytes                 print the byte counts.\n"
         "\t-l, --lines                 print the newline counts.\n"
         "\t-f, --max-line-length       print the maximum display width.\n"
         "\t-w, --words                 print the word counts.\n"
         "\t-h, --help                  display this help and exit.\n\n";
    std::exit(EXIT_SUCCESS);
}

This function is functionally untestable.

Rather than hard-coding std::cout, you should take the output stream as a parameter. You might want to display the help message to std::cerr in some cases. Even if not, using an output parameter allows the function to be tested.

Writing the output string like this is… not great. It’s not all that readable. This kind of thing is what raw string literals are for:

auto help(std::ostream& out)
{
    out <<
R"(USAGE
    wc [OPTION]... [FILE]...

NAME
    wc - word, line, and byte count.

DESCRIPTION
    Print newline, word, and byte counts for each FILE, and a total line if more
    than one FILE is specified. A word is a non-zero-length sequence of printable
    characters delimited by white space. When no FILE, or when FILE is -, read
    standard input.
    If FILE is a directory, wc behaves the same as if all the directory's files
    were listed separately.

OPTION:
    -c, --bytes                 print the byte counts.
    -l, --lines                 print the newline counts.
    -f, --max-line-length       print the maximum display width.
    -w, --words                 print the word counts.
    -h, --help                  display this help and exit.

)";
}

And, of course, don’t use std::exit(). Not only is std::exit() bad on its own, it really makes the function untestable.

static void err_msg()
{
    std::cerr << "The syntax of the command is incorrect.\n"
                 "Try " << argv0 << " -h for more information.\n";
    std::exit(EXIT_FAILURE);
}

Again, to be testable, this should not be using globals. Just pass the output stream and program name as arguments.

And again, no std::exit().

static void parse_options(int argc, char *argv[])

In C++, we keep the type modifiers with the type. In other words:

char *argv[]: this is C-style
char* argv[]: this is C++-style

    int c {};

    while ((c = getopt_long(argc, argv, "clfwh", long_options, nullptr)) != -1) {
        // ...
    }

I’m not a fan of this structure for a number of reasons:

It requires declaring the return object ahead of time.
The return value also leaks out of the while scope.
It hides an assignment in the middle of an expression.

Don’t do multiple things in a single expression. Keep things simple and clear:

    while (true)
    {
        auto const c = ::getopt_long(/*...*/);
        if (c == -1)
            break; // or return as the case may be

        // ...
    }

Now, as written, parse_options() does a lot more than just parse options. It parses the options and then, in some cases, actually carries out other operations, like printing error messages, displaying the help, and even ending the program.

I expect when I see a function with a name like parse_options() it should… parse options. Not terminate the whole freakin’ program if the mood strikes it.

One function ⇔ one responsibility.

A parse function should parse. Nothing more. Nothing less. It should return the results of that parsing, successful or otherwise. Some other function should decide what to do with the parsed result, whether that be displaying the help or exiting or whatever.

In summary:

Don’t use globals. They make testing functionally impossible.
Live by the single responsibility principle. Functions should have single, simple jobs, and do nothing else but that job.
Test your code. Write code to be tested from the start, and test it.
Never use std::exit(). It is a C library function, for C programs. It is not for C++.

I originally had a struct with the flags and had everything local, but then I looked at my previous reviews for shell utilities (C, not C++) and read the source code of some GNU coreutils, and made everything global. :) — Harith, Commented May 8 at 20:38
auto main(int argc, char* argv[]) -> int ==> I didn't know Python's syntax had made its way into C++. How's this better than int main(...)? — Harith, Commented May 8 at 20:44
TBH, the command line options do have rather a global significance for a tool like this, and it's not like the code is untestable, as you can still run the program to see if it gives the proper output. And, seriously, what would you test with a function that just prints an error message? Would you duplicate the message in the test suite to check that the function gives the correct output..? — ilkkachu, Commented May 9 at 7:44
@Harith, the thing about globals is that they're bad if you ever extend the code so that you might need more than one instance if the value. In general, it's mostly a good idea to consider that beforehand, rather than taking the risk of needing extensive overhauls over the code base later. But you can take into consideration how likely that is going to be. And if your goal is to implement the set of standard shell utilities (a rather fixed set), the chances of needing to overhaul wc into a reusable library function are pretty close to nil. — ilkkachu, Commented May 9 at 18:33
I don’t really use header-only libraries, and particularly not for testing. But Catch2 is fine. Boost.Test is what I usually use, and works as header-only. There’s also μt, doctest, and many more. Even GoogleTest would work if you must. Really, any testing library is fine. One can quibble over which one is better, but most are basically the same for simple purposes, and in any case, even the worst testing library ever made is infinitely better than no testing library at all. So… it doesn’t really matter which you pick. — indi, Commented May 10 at 19:13

Stack Exchange Network

Parsing command-line arguments with getopt in C++

Review Request:

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
c++
parsing
console
posix
or ask your own question.

Hot Network Questions

Parsing command-line arguments with getopt in C++

Review Request:

1 Answer 1

Not the answer you're looking for? Browse other questions tagged c++parsingconsoleposix or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
c++
parsing
console
posix
or ask your own question.