C/C++ best way to send a number of bytes to stdout

Question

Profiling my program and the function print is taking a lot of time to perform. How can I send "raw" byte output directly to stdout instead of using fwrite, and making it faster (need to send all 9bytes in the print() at the same time to the stdout) ?

void print(){
    unsigned char temp[9];

    temp[0] = matrix[0][0];
    temp[1] = matrix[0][1];
    temp[2] = matrix[0][2];
    temp[3] = matrix[1][0];
    temp[4] = matrix[1][1];
    temp[5] = matrix[1][2];
    temp[6] = matrix[2][0];
    temp[7] = matrix[2][1];
    temp[8] = matrix[2][2];

    fwrite(temp,1,9,stdout);

}

Matrix is defined globally to be a unsigned char matrix[3][3];

you can't. try printing less, or use buffered output like ostream — Iraimbilanja, Commented Feb 9, 2009 at 15:22

FreeMemory · Accepted Answer · 2009-02-09 15:24:07Z

10

IO is not an inexpensive operation. It is, in fact, a blocking operation, meaning that the OS can preempt your process when you call write to allow more CPU-bound processes to run, before the IO device you're writing to completes the operation.

The only lower level function you can use (if you're developing on a *nix machine), is to use the raw write function, but even then your performance will not be that much faster than it is now. Simply put: IO is expensive.

answered Feb 9, 2009 at 15:24

FreeMemory

8,5647 gold badges37 silver badges50 bronze badges

1

Slow? Not so. See my belated contribution below.
– Allan Stokes
Commented Apr 27, 2012 at 17:04
note you may need to do multiple calls to wirte since it's not guarantied that the buffer size passed in arguments will be equal to returned number of actually written bytes.
– Volodymyr Boiko
Commented Sep 13, 2019 at 23:45

Add a comment |

Allan Stokes · Accepted Answer · 2012-04-27 17:02:56Z

The top rated answer claims that IO is slow.

Here's a quick benchmark with a sufficiently large buffer to take the OS out of the critical performance path, but only if you're willing to receive your output in giant blurps. If latency to first byte is your problem, you need to run in "dribs" mode.

Write 10 million records from a nine byte array

Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1

   340ms   to /dev/null 
   710ms   to 90MB output file 
 15254ms   to 90MB output file in "dribs" mode

FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0

   450ms   to /dev/null 
   550ms   to 90MB output file on ZFS triple mirror
  1150ms   to 90MB output file on FFS system drive
 22154ms   to 90MB output file in "dribs" mode

There's nothing slow about IO if you can afford to buffer properly.

#include <stdio.h> 
#include <assert.h> 
#include <stdlib.h>
#include <string.h>

int main (int argc, char* argv[]) 
{
    int dribs = argc > 1 && 0==strcmp (argv[1], "dribs");
    int err;
    int i; 
    enum { BigBuf = 4*1024*1024 };
    char* outbuf = malloc (BigBuf); 
    assert (outbuf != NULL); 
    err = setvbuf (stdout, outbuf, _IOFBF, BigBuf); // full line buffering 
    assert (err == 0);

    enum { ArraySize = 9 };
    char temp[ArraySize]; 
    enum { Count = 10*1000*1000 }; 

    for (i = 0; i < Count; ++i) {
        fwrite (temp, 1, ArraySize, stdout);    
        if (dribs) fflush (stdout); 
    }
    fflush (stdout);  // seems to be needed after setting own buffer
    fclose (stdout);
    if (outbuf) { free (outbuf); outbuf = NULL; }
}

falstro · Accepted Answer · 2009-02-09 15:35:41Z

3

The rawest form of output you can do is the probable the write system call, like this

write (1, matrix, 9);

1 is the file descriptor for standard out (0 is standard in, and 2 is standard error). Your standard out will only write as fast as the one reading it at the other end (i.e. the terminal, or the program you're pipeing into) which might be rather slow.

I'm not 100% sure, but you could try setting non-blocking IO on fd 1 (using fcntl) and hope the OS will buffer it for you until it can be consumed by the other end. It's been a while, but I think it works like this

fcntl (1, F_SETFL, O_NONBLOCK);

YMMV though. Please correct me if I'm wrong on the syntax, as I said, it's been a while.

answered Feb 9, 2009 at 15:35

falstro

35.3k10 gold badges74 silver badges86 bronze badges

linux.die.net/man/2/fcntl O_NONBLOCK handles filesystem locks, not buffering.
– Basilevs
Commented Oct 12, 2009 at 11:57
1

Use stdout instead of magic number 1, more readable at least.
– hesham_EE
Commented Oct 1, 2014 at 19:08
1

@hesham_EE except stdout is a FILE* not a file descriptor, so that would be wrong. I think there are some standardized constants somewhere like FILENO_STDOUT or something. Not 100% sure though.
– falstro
Commented Oct 1, 2014 at 20:00

Add a comment |

Darron · Accepted Answer · 2009-02-09 15:58:33Z

3

Perhaps your problem is not that fwrite() is slow, but that it is buffered. Try calling fflush(stdout) after the fwrite().

This all really depends on your definition of slow in this context.

answered Feb 9, 2009 at 15:58

Darron

21.6k5 gold badges50 silver badges54 bronze badges

Add a comment |

jasedit · Accepted Answer · 2009-02-09 15:23:45Z

1

All printing is fairly slow, although iostreams are really slow for printing.

Your best bet would be to use printf, something along the lines of:

printf("%c%c%c%c%c%c%c%c%c\n", matrix[0][0], matrix[0][1], matrix[0][2], matrix[1][0],
  matrix[1][1], matrix[1][2], matrix[2][0], matrix[2][1], matrix[2][2]);

answered Feb 9, 2009 at 15:23

jasedit

6624 silver badges10 bronze badges

Note that he's talking about sending binary data, not string data.
– John Carter
Commented Feb 9, 2009 at 16:03

Add a comment |

Ketan · Accepted Answer · 2009-02-09 16:14:35Z

1

As everyone has pointed out IO in tight inner loop is expensive. I have normally ended up doing conditional cout of Matrix based on some criteria when required to debug it.

If your app is console app then try redirecting it to a file, it will be lot faster than doing console refreshes. e.g app.exe > matrixDump.txt

answered Feb 9, 2009 at 16:14

Ketan

1,0178 silver badges17 bronze badges

Add a comment |

anonanon · Accepted Answer · 2009-02-09 15:26:06Z

0

What's wrong with:

fwrite(matrix,1,9,stdout);

both the one and the two dimensional arrays take up the same memory.

answered Feb 9, 2009 at 15:26

anon

Add a comment |

Daniel · Accepted Answer · 2009-02-09 16:02:22Z

0

Try running the program twice. Once with output and once without. You will notice that overall, the one without the io is the fastest. Also, you could fork the process (or create a thread), one writing to a file(stdout), and one doing the operations.

answered Feb 9, 2009 at 16:02

Daniel

3742 silver badges5 bronze badges

Add a comment |

Anton Stafeyev · Accepted Answer · 2020-04-04 15:43:45Z

So first, don't print on every entry. Basically what i am saying is do not do like that.

for(int i = 0; i<100; i++){
    printf("Your stuff");
}

instead allocate a buffer either on stack or on heap, and store you infomration there and then just throw this bufffer into stdout, just liek that

char *buffer = malloc(sizeof(100));
for(int i = 100; i<100; i++){
    char[i] = 1; //your 8 byte value goes here
}

//once you are done print it to a ocnsole with 
write(1, buffer, 100);

but in your case, just use write(1, temp, 9);

Daniel Bişar · Accepted Answer · 2020-05-06 08:47:35Z

I am pretty sure you can increase the output performance by increasing the buffer size. So you have less fwrite calls. write might be faster but I am not sure. Just try this:

❯ yes | dd of=/dev/null count=1000000 
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 2.18338 s, 234 MB/s

vs

> yes | dd of=/dev/null count=100000 bs=50KB iflag=fullblock
100000+0 records in
100000+0 records out
5000000000 bytes (5.0 GB, 4.7 GiB) copied, 2.63986 s, 1.9 GB/s

The same applies to your code. Some tests during the last days show that probably good buffer sizes are around 1 << 12 (=4096) and 1<<16 (=65535) bytes.

vdsf · Accepted Answer · 2009-02-09 15:27:51Z

-1

You can simply:

std::cout << temp;

printf is more C-Style.

Yet, IO operations are costly, so use them wisely.

answered Feb 9, 2009 at 15:27

vdsf

1,6182 gold badges18 silver badges22 bronze badges

Add a comment |

Collectives™ on Stack Overflow

C/C++ best way to send a number of bytes to stdout

11 Answers 11

Write 10 million records from a nine byte array

Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1

FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0

Not the answer you're looking for? Browse other questions tagged
c++
c
optimization
stdout
fwrite
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

11 Answers 11

Write 10 million records from a nine byte array

Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1

FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0

Not the answer you're looking for? Browse other questions tagged c++coptimizationstdoutfwrite or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
c++
c
optimization
stdout
fwrite
or ask your own question.