25
\$\begingroup\$

It's often convenient to use C-style printf format strings when writing C++. I often find the modifiers much simpler to use than C++ I/O manipulators, and if I'm cribbing from existing C code, it helps to be able to re-use the existing format strings.

Here's my take on creating a new std::string given a format and the corresponding arguments. I've given it an annotation so GCC can check the argument types agree when the format is a literal, but I don't know how to help other compilers.

I've provided a more memory-efficient version for C++17, which provides read/write access to the underlying array of a string. My copy of CPP Reference still says "Modifying the character array accessed through data() has undefined behavior", but the Web version has been edited (May 2017) to indicate that it's only the const version that has that constraint.

For earlier standards (I require minimum C++11), we may need to allocate a temporary array, as we can't write to a string's data. Unfortunately, this requires an extra allocation and copy.

#include <string>

std::string format(const char *fmt, ...)
#ifdef __GNUC__
    __attribute__ ((format (printf, 1, 2)))
#endif
    ;
// Implementation

#include <cstdio>
#include <cstdarg>

#if __cplusplus < 201703L
#include <memory>
#endif

std::string format(const char *fmt, ...)
{
    char buf[256];

    va_list args;
    va_start(args, fmt);
    const auto r = std::vsnprintf(buf, sizeof buf, fmt, args);
    va_end(args);

    if (r < 0)
        // conversion failed
        return {};

    const size_t len = r;
    if (len < sizeof buf)
        // we fit in the buffer
        return { buf, len };

#if __cplusplus >= 201703L
    // C++17: Create a string and write to its underlying array
    std::string s(len, '\0');
    va_start(args, fmt);
    std::vsnprintf(s.data(), len+1, fmt, args);
    va_end(args);

    return s;
#else
    // C++11 or C++14: We need to allocate scratch memory
    auto vbuf = std::unique_ptr<char[]>(new char[len+1]);
    va_start(args, fmt);
    std::vsnprintf(vbuf.get(), len+1, fmt, args);
    va_end(args);

    return { vbuf.get(), len };
#endif
}
// Test program

#include <iostream>
int main()
{
    std::clog << "'" << format("a")
              << "'" << std::endl;
    std::clog << "'" << format("%#x", 1337)
              << "'" << std::endl;
    std::clog << "'" << format("--%c--", 0) // an embedded NUL
              << "'" << std::endl;
    std::clog << "'" << format("%300s++%6.2f", "**", 0.0).substr(300)
              << "'" << std::endl;
}

void provoke_warnings()
{
    // warning: zero-length gnu_printf format string
    // [-Wformat-zero-length]
    std::clog << "'" << format("") << "'" << std::endl;

    // warning: format ‘%c’ expects argument of type ‘int’, but
    // argument 2 has type ‘const char*’ [-Wformat=]
    std::clog << "'" << format("%c", "bar") << "'" << std::endl;
}

I've compiled the code with both C++17 and C++11 compilers, and verified them both under Valgrind using this test program.

I'd welcome any comments on the code itself or on my testing.

\$\endgroup\$
9
  • 2
    \$\begingroup\$ A common response would be "use braces even for single-statement if blocks", but I'm reluctant to tell that to one of the leading c++ contributors :) \$\endgroup\$
    – Martin R
    Commented Feb 9, 2018 at 14:49
  • \$\begingroup\$ @Martin: My personal preference is not to use braces for a single return statement. But I have to admit I'm not even consistent these days... \$\endgroup\$ Commented Feb 9, 2018 at 14:51
  • 1
    \$\begingroup\$ I'm hoping someone will pick up on that literal 256 and offer advice for a better number and a name for it. \$\endgroup\$ Commented Feb 9, 2018 at 14:52
  • \$\begingroup\$ strCat from googles Abseil library is supposed to be pretty efficient github.com/abseil/abseil-cpp/blob/master/absl/strings/str_cat.h \$\endgroup\$ Commented Feb 9, 2018 at 15:29
  • 4
    \$\begingroup\$ @MartinR I am not reluctant to tell anybody to use braces even for single line statements. Though the bugs caused by this are rare nowadays (because of reduced macro usage) they are now doubly hard to spot. So chance of an error is small, but when you get an error it is extremely hard to spot. Using braces avoids the possibility of an error caused by multi line macros. So my advice is always use braces. \$\endgroup\$ Commented Feb 9, 2018 at 18:37

4 Answers 4

7
\$\begingroup\$

The design is sound.

Despite what naysayers may express, there are a few overwhelming advantages that your solution based on venerable printf and C-variadics arguments has over ostream and C++ variadics:

  • performance: ostream has terrible formatting performance by design,
  • footprint: any variadic template solution must be carefully designed to avoid the bloat resulting of instantiating one variant for each and every combination of arguments; reaching zero-cost is only possible if the function can be fully inlined without increasing the call-site footprint (possibly by delegating to a non-template core).

Your design sidesteps those two pitfalls, which is great.

Furthermore, your use of the format attribute ensures a compile-time check of the format vs the arguments. Unlike the contenders presented, it will diagnose at compile-time that the number of arguments matches (on top of their types), avoiding the necessity for runtime errors.


Nitpick

I really encourage you to place braces systematically around if-blocks. There's little reason not to, and it'll prevent the occasional slip-up.


Weaknesses

There are two weakness to the design:

  • no variant allowing the user to specify the buffer,
  • a very limited set of accepted types.

The first is an issue for composition and reuse.

  • Composition: if I wish to create a larger string by calling in a sub-function, it will create several intermediate buffers which may negate the performance advantage the solution has in terms of raw-formatting,
  • Reuse: the user may already have a sufficiently large buffer available.

Unfortunately, the C++ standard library does not allow one to pass an existing buffer to a string (sigh) and is generally pretty lacking in raw buffers (sigh), so you'll have to roll your own.

I would do so in two steps: an abstract base class Write which exposes a way to write bytes in slices and a ready-made implementation based on std::unique_ptr<char, free> + size + capacity (not vector, because it zeroes the memory when resizing...).

The second is an issue for extension, and performance. In order to format their own types, users are encouraged to "pre-format" their own types into strings, which will result in needless temporary allocations.

There is unfortunately no simple way to solve this issue, it's a fundamental limitation of printf-based solution. It will be up to the users of the solution to decide whether the cost of temporary allocations is worth bearing, or not, on a per-call-site basis.

\$\endgroup\$
8
  • 2
    \$\begingroup\$ performance: ostream has terrible formatting performance by desig does it? If you are using std::cout dont't forget to unbind C++ streams from C streams (that is costly), also untie std::cin from std::cout and dont manually flush. If you follow these three rules you will see the performance of C++ streams is comparable. Fail on any of the above and performance does crash. \$\endgroup\$ Commented Feb 10, 2018 at 17:45
  • \$\begingroup\$ The advantage C++ has is that it does not need to parse a format string at runtime, instead it can use the type information and generate the appropriate calls at compile time. Though the C++ streams does have extensive Local support not covered by C streams that reduces this advantage. \$\endgroup\$ Commented Feb 10, 2018 at 17:45
  • \$\begingroup\$ Having said all that I will caveat that printf is raw speed faster, but terrible! \$\endgroup\$ Commented Feb 10, 2018 at 17:51
  • \$\begingroup\$ @MartinYork I was talking about std::ostream only (and notably std::ostringstream which would be used here for formatting), std::cout/std::cin is a whole other can of worms indeed (find tips here). You'd expect a properly design stream library to outperform printf due to the lack of runtime parsing, the fact that it doesn't is a testimony of its irreducible performance overhead. On the other hand, format strings are nice, as they quickly give an overview of the result... \$\endgroup\$ Commented Feb 10, 2018 at 18:03
  • 1
    \$\begingroup\$ @TobySpeight: I regularly use if (...) { return x; } as a one-liner (literally). Just because there are braces does not necessarily require inserting new lines. \$\endgroup\$ Commented Feb 12, 2018 at 10:00
12
\$\begingroup\$

I agree with you that 'printf-style' formatting was by many aspects better than C++'s manipulators: more concise, more varied, etc. That said, I feel like it's a step backwards if we port them into C++ without upgrading the C logic behind it. In my opinion, your code fails to do that for two reasons: lack of automatic memory management, and lack of type safety.

About type safety, I believe we now must use variadic templates when dealing with an unknown number of arguments. Variadic macros have too many shortcomings for us to continue using them. With variadic templates, we're able to verify if format directives and arguments match in number and types in a more reliable way (although I agree that when you can rely on a format chain the risk is manageable even with C-like macros).

About memory management: I believe we must avoid explicit memory management whenever we can. std::stringstream is a good way to do that when one wants to build a string progressively from heterogeneous arguments.

So here's an embryo of what I would consider more modern C++:

#include <string>
#include <sstream>
#include <iostream>
#include <type_traits>
#include <exception>

// base case of recursion, no more arguments
void format_impl(std::stringstream& ss, const char* format) { 
    while (*format) {
        if (*format == '%' && *++format != '%') // %% == % (not a format directive)
            throw std::invalid_argument("not enough arguments !\n");
        ss << *format++;
    }
}

template <typename Arg, typename... Args>
void format_impl(std::stringstream& ss, const char* format, Arg arg, Args... args) {
    while (*format) {
        if (*format == '%' && *++format != '%') {
            auto current_format_qualifier = *format;
            switch(current_format_qualifier) {
                case 'd' : if (!std::is_integral<Arg>()) throw std::invalid_argument("%d introduces integral argument");
                // etc.
            }
            // it's true you'd have to handle many more format qualifiers, but on a safer basis
            ss << arg; // arg type is deduced
            return format_impl(ss, ++format, args...); // one arg less
        }
        ss << *format++;
        } // the format string is exhausted and we still have args : throw
    throw std::invalid_argument("Too many arguments\n");
}

template <typename... Args>
std::string format(const char* fmt, Args... args) {
    std::stringstream ss;
    format_impl(ss, fmt, args...);
    return ss.str();
}

int main() {
    auto display = format("My name is %s and I'm %d year old", "John", 22);
    std::cout << display;
}
\$\endgroup\$
7
  • 3
    \$\begingroup\$ I know I'm lazy, but there's a lot of work beyond what you've shown, to get all the conversion flags handled properly. Perhaps a halfway house would be to use a variadic template to validate the arguments, but then use the varargs implementation I posted to do the actual formatting? Then we could just skip over any digits and -, +, #, ., hh, h, l, L, z, t (and handle * specially), rather than having to implement the whole of a printf implementation? \$\endgroup\$ Commented Feb 9, 2018 at 15:20
  • \$\begingroup\$ I don't see why you claim my code has no automatic memory management - that was my main motivation! I changed std::make_unique<char[]>(len+1) into the C++11 equivalent for portability, but I'm pretty sure there's no opportunity for leaks (and tested in Valgrind). If you can point to an actual memory management issue with what I wrote, I would appreciate it, because I genuinely don't understand the criticism there. P.S. I'm leaving for the weekend now, so you have until Monday! \$\endgroup\$ Commented Feb 9, 2018 at 15:23
  • 3
    \$\begingroup\$ @TobySpeight: automatic memory management might not be the accurate term. I meant explicit memory management. With stringstream, you don't even have to pronounce the n word. I would argue that, even if you claim not to understand my criticism, the fact that you felt better hiding new behind a string construction and used it directly only for "legacy" C++11 code is revealing ;-) Have a nice week-end! \$\endgroup\$
    – papagaga
    Commented Feb 9, 2018 at 15:40
  • 3
    \$\begingroup\$ There is one glaring issue with your proposed "corrective" solution, though: its performance compared to printf is abysmal. Add in the lack of compile-time error if the format doesn't match the arguments, and I end up preferring the printf-based solution. \$\endgroup\$ Commented Feb 10, 2018 at 12:25
  • 3
    \$\begingroup\$ @papagaga: Picking a (not so) random benchmark (github.com/fmtlib/fmt), it denotes a performance loss of 42% moving from printf to std::ostream: I stand by abysmal. If the format chain isn't known at compile-time, then indeed any printf based solution should not be used... however I would note that raw ostream cannot be used either in this case. Advising the OP to re-implement the full formatting engine of printf is not really practical (though it'd be an awesome project). \$\endgroup\$ Commented Feb 10, 2018 at 15:32
7
\$\begingroup\$

I agree that the old C ways of specifying string formatting was so much better than the current C++ way.

But I also really like (and can't do without) the type safety that was introduced by the C++ streams. So any new feature we add must maintain this. And I think this is where your code falls down for me.

Why are you using the old C variable argument parsing.

std::string format(const char *fmt, ...)
{
    char buf[256];

    va_list args;
    va_start(args, fmt);

Why not use the C++ variable argument template parameters.

template<typename... Args>
std::string format(const char *fmt, Args const&...)
{
    char buf[256];

Another thing I like about the C++ way is that it basically does all the work at compile time rather than run time. Run time parsing of the format string seems a bad idea when we could do it at compile time. Not sure how to solve that, but my thought.

Also I don't like that you build a string and return that as the result. This means we are doing extra work. Why not use the string to simply apply the correct output formatters to the stream.

class Format
{
    public:
        template<typename... Args>
        Format(char const& fmt, Args const&... args)
            : /* Magic */
        {}

        friend std::ostream& operator<<(std::ostream& s, Format const& formatData) {
            // More Magic
            return s;
        }
};

int main()
{
    std::cout << Format("My stuff %5.3f\n", 12.233) << " Still more\n";
} 

This would result in the equivalent of:

    std::cout << std::string_view("My stuff %5.3f\n" + 0,  9)  // the string upto %
              << std::setw(5) << std::setprecision(3) << 12.233
              << std::string_view("My stuff %5.3f\n" + 14, 1)  // the string after the formatter
              << " Still more\n";

Now I know writing all this code to do this is actually very daunting (and a large project beyond what you wanted to do). Especially since you can move all the difficult work out to std::vsnprintf().

But I bet there is also a extensive set of unit tests that exist in current C compiler implementations that we could steal. Then we could set it up as a community project and get people to implement parts and slowly get all the unit tests to work.

\$\endgroup\$
6
  • 2
    \$\begingroup\$ Though there are alreay quite a few such libraries. tinyformat comes to mind. \$\endgroup\$ Commented Feb 9, 2018 at 21:16
  • 1
    \$\begingroup\$ second code block has missing > or unterminated template <typename T> \$\endgroup\$
    – cat
    Commented Feb 10, 2018 at 0:04
  • \$\begingroup\$ I tried writing variable template arguments as an experiment - GCC won't let me use __attribute__ ((format)) unless it's a C-style variadic function. That means I'd have to reimplement a much larger formatter (which is possible, but not what I want to do). One possibility (that I touched on in a comment to papagaga's review) would be to check types in a C++ fashion prior to formatting with vsnprintf(). I might try that and report back with it. \$\endgroup\$ Commented Feb 12, 2018 at 9:47
  • \$\begingroup\$ @TobySpeight: I get it. I was looking at the C++ formatting code this weekend. There is so much that C printf() does that is not covered by C++ iostreams that I am surprised that there have not been bigger blow ups about it. I figure that most people that want formatted input /output just stuck with there C code and wrapped it in a C++ layer. \$\endgroup\$ Commented Feb 12, 2018 at 17:31
  • 1
    \$\begingroup\$ @falkb: I created my own: github.com/Loki-Astari/ThorsIOUtil \$\endgroup\$ Commented Jan 17, 2020 at 19:46
3
\$\begingroup\$

Have you considered using string::begin() with the property that the string is stored contiguously? According to cppreference.com, this property has been true since C++11. With this property, you could implement your function with

std::string s(len, '\0');
va_start(args, fmt);
std::vsnprintf(&(*s.begin()), len+1, fmt, args);

and this is valid in C++11 and beyond. This works for me in Visual Studio 2012.

\$\endgroup\$
1
  • \$\begingroup\$ Good idea - it saves copying, and for longer strings, saves an allocation, too. \$\endgroup\$ Commented Sep 14, 2018 at 9:14

Not the answer you're looking for? Browse other questions tagged or ask your own question.