1

If I want to pipe bytes of data in to a C/C++ program on Linux like this:

cat my_file | ./my_app

but:

  1. We cannot assume the piped data is going to originate from a file
  2. We wish to interpret the data as bytes in the file (as opposed to strings)

what would be the fastest technique to read the pipe from the C/C++ application?

I have done a little research and found:

  • read()
  • std::cin.read()
  • popen()

but I am not sure if there is a better way, or which of the above would be better.

EDIT: There is a performance requirement on this, hence why I am asking for the technique with the smallest overhead.

11
  • Note popen() doesn't read anything from the pipe, but just gives you the necessary file descriptors you can use to call read(). Commented Feb 21, 2014 at 19:02
  • Not popen(). In C, use read(); in C++, probably read() again, but using std::cin would also work. Commented Feb 21, 2014 at 19:03
  • 1
    And why are you bothering about performance? Using the | in the shell and reading from std::cin would be fairly OK, flexible and robust. Commented Feb 21, 2014 at 19:04
  • FYI, cat my_file | ./my_app is slower than ./my_app <file, and less flexible (you get a descriptor you can't seek() on). Why are you specifying it that way? Commented Feb 21, 2014 at 19:07
  • @CharlesDuffy simply my misunderstanding that there were two ways and one is faster...
    – user997112
    Commented Feb 21, 2014 at 19:08

2 Answers 2

4

Why do you care that much about performance?

1 gigabyte from /dev/urandom can be piped into wc in 1 minutes (and wc is running 15% of the time, waiting for data on the rest) ! Just try time (head -1000000000c /dev/urandom|wc)

But the fastest way would be to use the read(2) syscall with a quite big buffer (e.g. 64Kbytes to 256Kbytes).

Of course, read Advanced Linux Programming and carefully syscalls(2) related man pages.

Study for inspiration the source code of the Linux kernel, of GNU libc, of musl-libc. They all are open source projects, so feel free to contribute to them and to improve them.

But I bet that in practice using popen, or stdin, or reading from std::cin won't add much overhead.

You could also increase the stdio buffer with setvbuf(3).

See also this question.

(If you read from stdin the file descriptor is STDIN_FILENO which is 0)

You might be interested by time(7), vdso(7), syscalls(2)

You certainly should read documentation of GCC and this draft report.

You could use machine learning techniques to optimize performance.

Look into the MILEPOST GCC and Ctuning projects. Consider joining the RefPerSys one. Read of course Understanding machine learning: From theory to algorithms ISBN 978-1-107-05713-5

4
  • May I ask, how would I get the file descriptor for the second parameter?
    – user997112
    Commented Feb 21, 2014 at 19:22
  • Sorry one last question- I am about to replace my std::cin.read() with the system call read() you suggested. However, I also have a while loop checking for !std::cin.eof() so that I grab all bytes. To see the full benefits of the read() system call is there a way I can replace the check for while(!std::cin.eof()) ?
    – user997112
    Commented Feb 21, 2014 at 19:29
  • 1
    I don't understand why you care that much about performance. But please read the man page of read(2). What is your application? Please edit your question to tell much more about it. Commented Feb 21, 2014 at 19:42
  • @user997112, don't mix buffered and unbuffered calls regarding the same file descriptor -- if you're using read() (which is unbuffered), you shouldn't be using any std::cin calls. Commented Feb 21, 2014 at 20:36
2

When you pipe data in like that, the piped input is the standard input. Just read from cin (or stdin) like a normal console program.

Just use std::cin.read(). There's no reason to deal with popen() or its ilk.


Just to clarify... there is no pipe-specific way to read the input. As far as your program is concerned, there's cin and that's it.

This question might help you out on the speed front though... Why is reading lines from stdin much slower in C++ than Python?

5
  • This has to be written with high performance in mind- thats why I am asking what is the method with the smallest overhead?
    – user997112
    Commented Feb 21, 2014 at 19:08
  • Minimizing overhead is very situationally dependent -- sometimes standard-library-provided buffering helps things, sometimes it hurts. I don't know that you could get a generic answer that's always going to be right; you'd probably be better off benchmarking. Commented Feb 21, 2014 at 19:09
  • Besides using std::cin::read() what other possibilities do I have to benchmark against?
    – user997112
    Commented Feb 21, 2014 at 19:13
  • Well, std::cin::read() is the C++ way. There's also the C standard-library calls, and direct invocation of your OS syscalls. And your operating system and standard library will typically provide a lot of flags for tuning. Commented Feb 21, 2014 at 19:15
  • But unless you've already measured enough to know you have a problem, and you know it's the read process that's causing the problem, why are you here to start with? :) Commented Feb 21, 2014 at 19:15

Not the answer you're looking for? Browse other questions tagged or ask your own question.