3

Profiling my program and the function print is taking a lot of time to perform. How can I send "raw" byte output directly to stdout instead of using fwrite, and making it faster (need to send all 9bytes in the print() at the same time to the stdout) ?

void print(){
    unsigned char temp[9];

    temp[0] = matrix[0][0];
    temp[1] = matrix[0][1];
    temp[2] = matrix[0][2];
    temp[3] = matrix[1][0];
    temp[4] = matrix[1][1];
    temp[5] = matrix[1][2];
    temp[6] = matrix[2][0];
    temp[7] = matrix[2][1];
    temp[8] = matrix[2][2];

    fwrite(temp,1,9,stdout);

}

Matrix is defined globally to be a unsigned char matrix[3][3];

1
  • you can't. try printing less, or use buffered output like ostream
    – Iraimbilanja
    Commented Feb 9, 2009 at 15:22

11 Answers 11

10

IO is not an inexpensive operation. It is, in fact, a blocking operation, meaning that the OS can preempt your process when you call write to allow more CPU-bound processes to run, before the IO device you're writing to completes the operation.

The only lower level function you can use (if you're developing on a *nix machine), is to use the raw write function, but even then your performance will not be that much faster than it is now. Simply put: IO is expensive.

2
  • 1
    Slow? Not so. See my belated contribution below. Commented Apr 27, 2012 at 17:04
  • note you may need to do multiple calls to wirte since it's not guarantied that the buffer size passed in arguments will be equal to returned number of actually written bytes. Commented Sep 13, 2019 at 23:45
10

The top rated answer claims that IO is slow.

Here's a quick benchmark with a sufficiently large buffer to take the OS out of the critical performance path, but only if you're willing to receive your output in giant blurps. If latency to first byte is your problem, you need to run in "dribs" mode.

Write 10 million records from a nine byte array

Mint 12 AMD64 on 3GHz CoreDuo under gcc 4.6.1

   340ms   to /dev/null 
   710ms   to 90MB output file 
 15254ms   to 90MB output file in "dribs" mode 

FreeBSD 9 AMD64 on 2.4GHz CoreDuo under clang 3.0

   450ms   to /dev/null 
   550ms   to 90MB output file on ZFS triple mirror
  1150ms   to 90MB output file on FFS system drive
 22154ms   to 90MB output file in "dribs" mode

There's nothing slow about IO if you can afford to buffer properly.

#include <stdio.h> 
#include <assert.h> 
#include <stdlib.h>
#include <string.h>

int main (int argc, char* argv[]) 
{
    int dribs = argc > 1 && 0==strcmp (argv[1], "dribs");
    int err;
    int i; 
    enum { BigBuf = 4*1024*1024 };
    char* outbuf = malloc (BigBuf); 
    assert (outbuf != NULL); 
    err = setvbuf (stdout, outbuf, _IOFBF, BigBuf); // full line buffering 
    assert (err == 0);

    enum { ArraySize = 9 };
    char temp[ArraySize]; 
    enum { Count = 10*1000*1000 }; 

    for (i = 0; i < Count; ++i) {
        fwrite (temp, 1, ArraySize, stdout);    
        if (dribs) fflush (stdout); 
    }
    fflush (stdout);  // seems to be needed after setting own buffer
    fclose (stdout);
    if (outbuf) { free (outbuf); outbuf = NULL; }
}
0
3

The rawest form of output you can do is the probable the write system call, like this

write (1, matrix, 9);

1 is the file descriptor for standard out (0 is standard in, and 2 is standard error). Your standard out will only write as fast as the one reading it at the other end (i.e. the terminal, or the program you're pipeing into) which might be rather slow.

I'm not 100% sure, but you could try setting non-blocking IO on fd 1 (using fcntl) and hope the OS will buffer it for you until it can be consumed by the other end. It's been a while, but I think it works like this

fcntl (1, F_SETFL, O_NONBLOCK);

YMMV though. Please correct me if I'm wrong on the syntax, as I said, it's been a while.

3
  • linux.die.net/man/2/fcntl O_NONBLOCK handles filesystem locks, not buffering.
    – Basilevs
    Commented Oct 12, 2009 at 11:57
  • 1
    Use stdout instead of magic number 1, more readable at least.
    – hesham_EE
    Commented Oct 1, 2014 at 19:08
  • 1
    @hesham_EE except stdout is a FILE* not a file descriptor, so that would be wrong. I think there are some standardized constants somewhere like FILENO_STDOUT or something. Not 100% sure though.
    – falstro
    Commented Oct 1, 2014 at 20:00
3

Perhaps your problem is not that fwrite() is slow, but that it is buffered. Try calling fflush(stdout) after the fwrite().

This all really depends on your definition of slow in this context.

1

All printing is fairly slow, although iostreams are really slow for printing.

Your best bet would be to use printf, something along the lines of:

printf("%c%c%c%c%c%c%c%c%c\n", matrix[0][0], matrix[0][1], matrix[0][2], matrix[1][0],
  matrix[1][1], matrix[1][2], matrix[2][0], matrix[2][1], matrix[2][2]);
1
  • Note that he's talking about sending binary data, not string data. Commented Feb 9, 2009 at 16:03
1

As everyone has pointed out IO in tight inner loop is expensive. I have normally ended up doing conditional cout of Matrix based on some criteria when required to debug it.

If your app is console app then try redirecting it to a file, it will be lot faster than doing console refreshes. e.g app.exe > matrixDump.txt

0

What's wrong with:

fwrite(matrix,1,9,stdout);

both the one and the two dimensional arrays take up the same memory.

0
0

Try running the program twice. Once with output and once without. You will notice that overall, the one without the io is the fastest. Also, you could fork the process (or create a thread), one writing to a file(stdout), and one doing the operations.

0

So first, don't print on every entry. Basically what i am saying is do not do like that.

for(int i = 0; i<100; i++){
    printf("Your stuff");
}

instead allocate a buffer either on stack or on heap, and store you infomration there and then just throw this bufffer into stdout, just liek that

char *buffer = malloc(sizeof(100));
for(int i = 100; i<100; i++){
    char[i] = 1; //your 8 byte value goes here
}

//once you are done print it to a ocnsole with 
write(1, buffer, 100); 

but in your case, just use write(1, temp, 9);

0

I am pretty sure you can increase the output performance by increasing the buffer size. So you have less fwrite calls. write might be faster but I am not sure. Just try this:

❯ yes | dd of=/dev/null count=1000000 
1000000+0 records in
1000000+0 records out
512000000 bytes (512 MB, 488 MiB) copied, 2.18338 s, 234 MB/s

vs

> yes | dd of=/dev/null count=100000 bs=50KB iflag=fullblock
100000+0 records in
100000+0 records out
5000000000 bytes (5.0 GB, 4.7 GiB) copied, 2.63986 s, 1.9 GB/s

The same applies to your code. Some tests during the last days show that probably good buffer sizes are around 1 << 12 (=4096) and 1<<16 (=65535) bytes.

-1

You can simply:

std::cout << temp;

printf is more C-Style.

Yet, IO operations are costly, so use them wisely.

Not the answer you're looking for? Browse other questions tagged or ask your own question.