Fastest way to read a pipe from C/C++ program?

Question

If I want to pipe bytes of data in to a C/C++ program on Linux like this:

cat my_file | ./my_app

but:

We cannot assume the piped data is going to originate from a file
We wish to interpret the data as bytes in the file (as opposed to strings)

what would be the fastest technique to read the pipe from the C/C++ application?

I have done a little research and found:

read()
std::cin.read()
popen()

but I am not sure if there is a better way, or which of the above would be better.

EDIT: There is a performance requirement on this, hence why I am asking for the technique with the smallest overhead.

Note popen() doesn't read anything from the pipe, but just gives you the necessary file descriptors you can use to call read(). — πάντα ῥεῖ, Commented Feb 21, 2014 at 19:02
Not popen(). In C, use read(); in C++, probably read() again, but using std::cin would also work. — Jonathan Leffler, Commented Feb 21, 2014 at 19:03
And why are you bothering about performance? Using the | in the shell and reading from std::cin would be fairly OK, flexible and robust. — πάντα ῥεῖ, Commented Feb 21, 2014 at 19:04
FYI, cat my_file | ./my_app is slower than ./my_app <file, and less flexible (you get a descriptor you can't seek() on). Why are you specifying it that way? — Charles Duffy, Commented Feb 21, 2014 at 19:07
@CharlesDuffy simply my misunderstanding that there were two ways and one is faster... — user997112, Commented Feb 21, 2014 at 19:08

Basile Starynkevitch · Accepted Answer · 2020-09-19 16:00:34Z

4

Why do you care that much about performance?

1 gigabyte from /dev/urandom can be piped into wc in 1 minutes (and wc is running 15% of the time, waiting for data on the rest) ! Just try time (head -1000000000c /dev/urandom|wc)

But the fastest way would be to use the read(2) syscall with a quite big buffer (e.g. 64Kbytes to 256Kbytes).

Of course, read Advanced Linux Programming and carefully syscalls(2) related man pages.

Study for inspiration the source code of the Linux kernel, of GNU libc, of musl-libc. They all are open source projects, so feel free to contribute to them and to improve them.

But I bet that in practice using popen, or stdin, or reading from std::cin won't add much overhead.

You could also increase the stdio buffer with setvbuf(3).

You could use machine learning techniques to optimize performance.

Look into the MILEPOST GCC and Ctuning projects. Consider joining the RefPerSys one. Read of course Understanding machine learning: From theory to algorithms ISBN 978-1-107-05713-5

edited Sep 19, 2020 at 16:00

answered Feb 21, 2014 at 19:21

Basile Starynkevitch

1

May I ask, how would I get the file descriptor for the second parameter?
– user997112
Commented Feb 21, 2014 at 19:22
Sorry one last question- I am about to replace my std::cin.read() with the system call read() you suggested. However, I also have a while loop checking for !std::cin.eof() so that I grab all bytes. To see the full benefits of the read() system call is there a way I can replace the check for while(!std::cin.eof()) ?
– user997112
Commented Feb 21, 2014 at 19:29
1

I don't understand why you care that much about performance. But please read the man page of read(2). What is your application? Please edit your question to tell much more about it.
– Basile Starynkevitch
Commented Feb 21, 2014 at 19:42
@user997112, don't mix buffered and unbuffered calls regarding the same file descriptor -- if you're using read() (which is unbuffered), you shouldn't be using any std::cin calls.
– Charles Duffy
Commented Feb 21, 2014 at 20:36

Add a comment |

Community · Accepted Answer · 2017-05-23 11:57:47Z

2

When you pipe data in like that, the piped input is the standard input. Just read from cin (or stdin) like a normal console program.

Just use std::cin.read(). There's no reason to deal with popen() or its ilk.

Just to clarify... there is no pipe-specific way to read the input. As far as your program is concerned, there's cin and that's it.

This question might help you out on the speed front though... Why is reading lines from stdin much slower in C++ than Python?

edited May 23, 2017 at 11:57

CommunityBot

11 silver badge

answered Feb 21, 2014 at 19:06

QuestionC

10.1k4 gold badges27 silver badges44 bronze badges

This has to be written with high performance in mind- thats why I am asking what is the method with the smallest overhead?
– user997112
Commented Feb 21, 2014 at 19:08
Minimizing overhead is very situationally dependent -- sometimes standard-library-provided buffering helps things, sometimes it hurts. I don't know that you could get a generic answer that's always going to be right; you'd probably be better off benchmarking.
– Charles Duffy
Commented Feb 21, 2014 at 19:09
Besides using std::cin::read() what other possibilities do I have to benchmark against?
– user997112
Commented Feb 21, 2014 at 19:13
Well, std::cin::read() is the C++ way. There's also the C standard-library calls, and direct invocation of your OS syscalls. And your operating system and standard library will typically provide a lot of flags for tuning.
– Charles Duffy
Commented Feb 21, 2014 at 19:15
But unless you've already measured enough to know you have a problem, and you know it's the read process that's causing the problem, why are you here to start with? :)
– Charles Duffy
Commented Feb 21, 2014 at 19:15

Add a comment |

Collectives™ on Stack Overflow

Fastest way to read a pipe from C/C++ program?

2 Answers 2

Why do you care that much about performance?

You could use machine learning techniques to optimize performance.

Not the answer you're looking for? Browse other questions tagged
c++
linux
shell
unix
pipe
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Why do you care that much about performance?

You could use machine learning techniques to optimize performance.

Not the answer you're looking for? Browse other questions tagged c++linuxshellunixpipe or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
c++
linux
shell
unix
pipe
or ask your own question.