1

Suppose that you have the following code:

#include <iostream>

template <typename T>
class Example
{
  public:
    Example() = default;
    Example(const T &_first_ele, const T &_second_ele) : first_(_first_ele), second_(_second_ele) { }

    friend std::ostream &operator<<(std::ostream &os, const Example &a)
    {
      return (os << a.first_ << " " << a.second_);
    }

  private:
    T first_;
    T second_;
};

int main()
{
  Example example_(3.45, 24.6); // Example<double> till C++14
  std::cout << example_ << "\n";
}

Is this the only way to overload the operator<<?

friend std::ostream &operator<<(std::ostream &os, const Example &a)
{
  return (os << a.first_ << " " << a.second_);
}

In terms of performance, is it the best way to overload it or are there better options to do this implementation?

9
  • Do you think there are better options? Think about what it is doing, and ask which parts you don't need. Are there any? Commented Dec 21, 2019 at 17:56
  • 1
    What would that do differently? How would it look? Give it a go and time it to see whether it's faster. Commented Dec 21, 2019 at 18:03
  • 3
    What performance concern do you have here? The cost of the external I/O is usually much higher than that of your code, and when that’s not true the iostreams library isn’t particularly fast anyway. Commented Dec 22, 2019 at 0:09
  • 4
    @EmanueleOggiano: "Emanuele Oggiano is looking for a canonical answer." It's kind of hard to provide a "canonical answer" when it's not clear what the question actually is. What do you mean by "way to overload operator<<"? What things are we allowed to change and what things aren't we? What performance concerns do you have with the existing code, and what is the basis of those concerns? Commented Dec 26, 2019 at 0:26
  • 1
    Depends on how you define "better" - without that (by definition) no canonical answer is possible. There is no requirement that an operator<<() be a friend, as long as the class provides accessible getters to the members. If those getters are inlined in code (and the implementation actually inlines those getters, which it is also not actually required to do - since inlining is a hint to the compiler, not a directive), there will be few measurable differences
    – Peter
    Commented Dec 30, 2019 at 2:16

4 Answers 4

2

I believe that the comments have answered your question well enough. From a pure performance standpoint, there likely is no "better" way to overload the << operator for output streams because your function is likely not the bottleneck in the first place.

I will suggest that there is a "better" way to write the function itself that handles some corner cases.

Your << overload, as it exists now, will 'break' when trying to perform certain output formatting operations.

std::cout << std::setw(15) << std::left << example_ << "Fin\n";

This does not left align your entire Example output. Instead it only left aligns the first_ member. This is because you put your items in the stream one at a time. std::left will grab the next item to left align, which is only a part of your class output.

The easiest way is to build a string and then dump that string into your output stream. Something like this:

friend std::ostream &operator<<(std::ostream &os, const Example &a)
{
    std::string tmp = std::to_string(a.first_) + " " + std::to_string(a.second_);
    return (os << tmp);
}

It's worth noting a few things here. The first is that in this specific example, you will get trailing 0's because you don't get any control over how std::to_string() formats its values. This may mean writing type-specific conversion functions to do any trimming for you. You may also be able to use std::string_views (to gain back some efficiency (again, it likely doesn't matter as the function itself is probably still not your bottleneck)), but I have no experience with them.

By putting all of the object's information into the stream at once, that left-align will now align the full output of your object.

There is also the argument about friend vs. non-friend. If the necessary getters exist, I would argue that non-friend is the way to go. Friends are useful, but also break encapsulation since they are non-member functions with special access. This gets way into opinion territory, but I don't write simple getters unless I feel that they are necessary, and I don't count << overloads as necessary.

1
  • 1
    If you can downvote, you can tell me why. I'm not averse to learning.
    – sweenish
    Commented Dec 31, 2019 at 14:37
0

As I understand, the question poses two ambiguity points:

  1. Whether you are specifically aiming at templated classes.
    I will assume the answer is YES.

  2. Whether there are better ways to overload the ostream operator<< (as compared to the friend-way), as posted in the title of the question (and assuming "better" refers to performance), or there are other ways, as posted in the body ("Is this the only way..."?)
    I will assume the first, as it encompasses the second.

I conceive at least 3 ways to overload the ostream operator<<:

  1. The friend-way, as you posted.
  2. The non-friend-way, with auto return type.
  3. The non-friend-way, with std::ostream return type.

They are exemplified at the bottom. I ran several tests. From all those test (see below the code used for that), I concluded that:

  1. Having compiled/linked in optimize mode (with -O3), and looping 10000 times each std::cout, all 3 methods provide essentially the same performance.

  2. Having compiled/linked in debug mode, without looping

    t1 ~ 2.5-3.5 * t2
    t2 ~ 1.02-1.2 * t3
    


    I.e., 1 is much slower than 2 and 3, which perform similarly.

I wouldn't know if these conclusions apply across systems. I wouldn't know either if you might be seeing behavior closer to 1 (most likely), or 2 (under particular conditions).


Code to define the three methods to overload operator<<
(I have removed default constructors, as they are irrelevant here).

Method 1 (as in the OP):

template <typename T>
class Example
{
  public:
    Example(const T &_first_ele, const T &_second_ele) : first_(_first_ele), second_(_second_ele) { }

    friend std::ostream &operator<<(std::ostream &os, const Example &a)
    {
      return (os << a.first_ << " " << a.second_);
    }

  private:
    T first_;
    T second_;
};

Method 2:

template <typename T>
class Example2
{
  public:
    Example2(const T &_first_ele, const T &_second_ele) : first_(_first_ele), second_(_second_ele) { }

    void print(std::ostream &os) const
    {
        os << this->first_ << " " << this->second_;
        return;
    }

  private:
    T first_;
    T second_;
};
template<typename T>
auto operator<<(std::ostream& os, const T& a) -> decltype(a.print(os), os)
{
    a.print(os);
    return os;
}

Method 3:

template <typename T>
class Example3
{
  public:
    Example3(const T &_first_ele, const T &_second_ele) : first_(_first_ele), second_(_second_ele) { }

    void print(std::ostream &os) const
    {
        os << this->first_ << " " << this->second_;
        return;
    }

  private:
    T first_;
    T second_;
};
// Note 1: If this function exists, the compiler makes it take precedence over auto... above
// If it does not exist, code compiles ok anyway and auto... above would be used
template <typename T>
std::ostream &operator<<(std::ostream &os, const Example3<T> &a)
{
    a.print(os);
    return os;
}
// Note 2: Explicit instantiation is not needed here.
//template std::ostream &operator<<(std::ostream &os, const Example3<double> &a);
//template std::ostream &operator<<(std::ostream &os, const Example3<int> &a);

Code used to test performance
(everything was placed in a single source file with

#include <iostream>
#include <chrono>

at the top):

int main()
{
    std::chrono::steady_clock::time_point begin = std::chrono::steady_clock::now();
    std::chrono::steady_clock::time_point end = std::chrono::steady_clock::now();
    const int nout = 10000;

    Example example_(3.45, 24.6); // Example<double> till C++14
    begin = std::chrono::steady_clock::now();
    for (int i = 0 ; i < nout ; i++ )
        std::cout << example_ << "\n";
    end = std::chrono::steady_clock::now();
    const double lapse1 = std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count();
    std::cout << "Time difference = " << lapse1 << "[us]" << std::endl;

    Example2 example2a_(3.5, 2.6); // Example2<double> till C++14
    begin = std::chrono::steady_clock::now();
    for (int i = 0 ; i < nout ; i++ )
        std::cout << example2a_ << "\n";
    end = std::chrono::steady_clock::now();
    const double lapse2a = std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count();
    std::cout << "Time difference = " << lapse2a << "[us]" << std::endl;

    Example2 example2b_(3, 2); // Example2<double> till C++14
    begin = std::chrono::steady_clock::now();
    for (int i = 0 ; i < nout ; i++ )
        std::cout << example2b_ << "\n";
    end = std::chrono::steady_clock::now();
    const double lapse2b = std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count();
    std::cout << "Time difference = " << lapse2b << "[us]" << std::endl;

    Example3 example3a_(3.4, 2.5); // Example3<double> till C++14
    begin = std::chrono::steady_clock::now();
    for (int i = 0 ; i < nout ; i++ )
        std::cout << example3a_ << "\n";
    end = std::chrono::steady_clock::now();
    const double lapse3a = std::chrono::duration_cast<std::chrono::microseconds>(end - begin).count();
    std::cout << "Time difference = " << lapse3a << "[us]" << std::endl;

    std::cout << "Time difference lapse1 = " << lapse1 << "[us]" << std::endl;
    std::cout << "Time difference lapse2a = " << lapse2a << "[us]" << std::endl;
    std::cout << "Time difference lapse2b = " << lapse2b << "[us]" << std::endl;
    std::cout << "Time difference lapse3a = " << lapse3a << "[us]" << std::endl;

    return 0;
}
3
  • 1
    This is not a proper benchmark. 1. You need to iterate the benchmark in a loop to avoid measuring cache effects. 2. I don't know how you can get any useful precision printing microsecond counts. With optimizations enabled it should take just around 1us to 3us or so for each of the executions. 3. If you correct these things, then there is still an appaerent difference, but that is simply because some of the test cases need to print longer strings. If you give all the example*_ variables the same values, then there is no clear difference anymore.
    – walnut
    Commented Jan 1, 2020 at 13:10
  • See godbolt.org/z/0TYZGe. The benchmark with original values, but looped and with nanoseconds output, is on the top and the one with equal values in all tests is at the bottom. Middle and right are GCC's and Clang's generated code executed, respectively.
    – walnut
    Commented Jan 1, 2020 at 13:11
  • @walnut - You are right! Accounting for the comment, I amended the code and answer. Commented Jan 1, 2020 at 14:31
-1

It's the obvious way to implement it. It's also probably the most efficient. Use it.

-3

The way you demonstrated in the question is the most basic way, which is also found in various C++ books. Personally I may not prefer in my production code, mainly because:

  • Have to write the boilerplate code for friend operator<< for each and every class.
  • When adding new class members, you may have to update the methods as well individually.

I would recommend following way since C++14:

Library

// Add `is_iterable` trait as defined in https://stackoverflow.com/a/53967057/514235
template<typename Derived>
struct ostream
{
  static std::function<std::ostream&(std::ostream&, const Derived&)> s_fOstream;

  static auto& Output (std::ostream& os, const char value[]) { return os << value; }
  static auto& Output (std::ostream& os, const std::string& value) { return os << value; }
  template<typename T>
  static
  std::enable_if_t<is_iterable<T>::value, std::ostream&>
  Output (std::ostream& os, const T& collection)
  {
    os << "{";
    for(const auto& value : collection)
      os << value << ", ";
    return os << "}";
  }
  template<typename T>
  static
  std::enable_if_t<not is_iterable<T>::value, std::ostream&>
  Output (std::ostream& os, const T& value) { return os << value; }

  template<typename T, typename... Args>
  static
  void Attach (const T& separator, const char names[], const Args&... args)
  {
    static auto ExecuteOnlyOneTime = s_fOstream =
    [&separator, names, args...] (std::ostream& os, const Derived& derived) -> std::ostream&
    {
      os << "(" << names << ") =" << separator << "(" << separator;
      int unused[] = { (Output(os, (derived.*args)) << separator, 0) ... }; (void) unused;
      return os << ")";
    };
  }

  friend std::ostream& operator<< (std::ostream& os, const Derived& derived)
  {
    return s_fOstream(os, derived);
  }
};

template<typename Derived>
std::function<std::ostream&(std::ostream&, const Derived&)> ostream<Derived>::s_fOstream;

Usage

Inherit the above class for those classes for whom you want the operator<< facility. Automatically friend will get included into those class's definition via base ostream. So no extra work. e.g.

class MyClass : public ostream<MyClass> {...};

Preferably in their constructors, you may Attach() the member variables which are to be printed. e.g.

// Use better displaying with `NAMED` macro
// Note that, content of `Attach()` will effectively execute only once per class
MyClass () { MyClass::Attach("\n----\n", &MyClass::x, &MyClass::y); }

Example

From what you shared,

#include"Util_ostream.hpp"

template<typename T>
class Example : public ostream<Example<T>> // .... change 1
{
public:
  Example(const T &_first_ele, const T &_second_ele) : first_(_first_ele), second_(_second_ele)
  {
    Example::Attach(" ", &Example::first_, &Example::second_); // .... change 2
  }

private:
  T first_;
  T second_;
};

Demo

This approach has a pointer access per every print of the variable instead of direct. This negligible indirection should never be a bottle-neck in a code from the performance perspective.
Demo is slight more complex for practical purpose.

Requirements

  • The intention here is to improve readability and uniformity of printing the variables
  • Every printable class should have their separate ostream<T> regardless of inheritance
  • An object should have operator<< defined or ostream<T> inherited to be able to compile

Facilities

This is now shaping up as a good library component. Below are add-on facilities, I have added so far.

  • Using ATTACH() macro, we can also print variable in certain way; Variable printing can always be customised as per need by modifying the library code
  • If the base class is printable, then we can simply pass a typecasted this; Rest will be taken care
  • Containers with std::begin/end compatibility are now supported, which includes vector as well as map

The code shown in the beginning is shorter for the quick understanding purpose. Those who are further interested may click on the demo link above.

5
  • "It may differ class to class depending on the number of variablese." And that difference will still have to be there, since each class will have to have a line that calls this Attach function. So that's not part of the boilerplate; the only actual boilerplate is the operator<< definition, the braces for the function, and the actual ostream<< bits. "Moreover, all these functions goes as a dead code in the production if not handled properly using guards." Compilers generally don't emit functions that don't get called. Commented Dec 26, 2019 at 5:47
  • "Preferably in their constructors, you may Attach() the member variables which are to be printed." Since your Attach function is now static, as well as the std::function itself, calling Attach from the class's constructor will simply result in a bunch of overwriting the function with the same function. Commented Dec 26, 2019 at 5:52
  • @NicolBolas, regarding "boilerplate" part, I give importance to the uniformity and readability. When the printing happens from a library code, it's assured to be uniformly printed. Moreover, having just one Attach() is much more intuitive than going through the friend function. Inheriting ostream improves the readability as it states that this class has the printing capability. Regarding your 2nd comment of overwriting, I think you have overseen the lazy initialization in the code. The code inside Attach() will effectively run only once per class (& not per class object).
    – iammilind
    Commented Dec 26, 2019 at 6:02
  • 1
    I delete it because I've messed up editing :( In the multilevel inheritance, it requires virtual inheritance This is bad for performance. Don't pay for what you don't need. we have to overload that first That's a rather dangerous idea. Now what if we want to print member names? What if we want to print indices for one class and not print them for the other? What if we want to print some members in hex? This becomes messy faster than you can write friend operator<<. Commented Dec 26, 2019 at 7:38
  • virtual inheritance is not needed. Edited. @n.'pronouns'm. member printing in 1 way is already demo-ed in above link. With the latest revision, I have also added support for std::vector like containers (not map yet, but should be easy). For custom printing, one can always define their custom operator<< or modify the above library. On a side note, worrying about virtual things (inheritance or method call) usually is premature optimisation. Especially in this case, where the i/o operations are heavy.
    – iammilind
    Commented Dec 26, 2019 at 8:24