4

I have to save some graph data(array of structs) into text file. I made working program using fprintf but for extra points I need to be faster. I have spend couple hours googling if there is anything faster and try to use fwrite (but I wasn't able to fwrite as a text) I cannot really find any other functions etc.

This is my write function using fprintf:

void save_txt(const graph_t * const graph, const char *fname)
{
    int count = graph->num_edges, i = 0;
    FILE *f = fopen(fname, "w");
    while (count > 0) {
        int r = fprintf(f, "%d %d %d\n", (graph->edges[i].from), (graph->edges[i].to), (graph->edges[i].cost));
        i++;
        if (r >= 6) {
            count -= 1;
        } else {
            break;
        }
    }
    if (f) {
        fclose(f);
    }
}
7
  • 1
    Are you sure the writing to a file part is the reason for your slowdown?
    – Jongware
    Commented Dec 23, 2017 at 12:24
  • 2
    Faster than fprintf? What is it you are doing? Maybe there's cache misses when iterating through the graph? What load is your cpu under? how do you define faster than fprintf Show us what you have tried, your expectations? Googling is not any use if you have not actually implemented an alternative? fwrite is for binary writing.
    – t0mm13b
    Commented Dec 23, 2017 at 12:34
  • As mentioned by above posters, please provide performance profiling results if you're asking for performance help. The issue may not be so straightforward as to guess with what was given. It seems that there could be three memory accesses in the graph->edges[i] but that would most probably stay in the L1 cache. The fprintf is an overkill obviously since your format is trivial. But those are just possibilities that may be completely wrong. Commented Dec 23, 2017 at 12:44
  • 2
    Did you profile the code to see where you're actually spending your time? If you don't do that, you're just guessing as to what you need to make faster. Commented Dec 23, 2017 at 14:18
  • 1
    ways to speed up the I/O processing are: 1) format the data in memory rather than in fprintf() 2) accumulate a block/array of the data, then write the block/array in a single call to write(). Suggest that block/array be 4096 bytes in size, Commented Dec 23, 2017 at 16:53

3 Answers 3

5

I would try setting a write buffer on the stream, and experimenting with different sizes of buffer (e.g. 1K, 2K, 4K, 8K and so on). Notice that by default your file is already using a buffer of BUFSIZ value, and it might be already enough.

#define BUFFERSIZE 0x1000

void save_txt(const graph_t * const graph, const char *fname)
{
    int count = graph->num_edges, i = 0;
    unsigned char buf[BUFFERSIZE];

    FILE *f = fopen(fname, "w");
    setvbuf(f, buf, _IOFBF, BUFFERSIZE);

    ...

The output file f is born with the default BUFSIZ cache, so it might benefit from a larger fully buffered write cache.

Of course this assumes that you're writing to a relatively slow medium and that the time spent saving is relevant; otherwise, whatever is slowing you down is not here, and therefore increasing save performances won't help you appreciably.

There are instrumentations like prof and gprof that can help you determine where your program is spending the most time.

One, much more awkward, possibility is merging Kiwi's answer with a buffered write call to avoid the code in printf that verifies which format to use, since you already know this, and to use as few I/O calls as possible (even just one if BUFFERSIZE is larger than your destination file's length).

// These variables must now be global, declared outside save_txt.
char kiwiBuf[BUFFERSIZE];
size_t kiwiPtr = 0;
FILE *f;

void my_putchar(char c) {
    kiwiBuf[kiwiPtr++] = c;
    // Is the buffer full?
    if (kiwiPtr == BUFFERSIZE) {
        // Yes, empty the buffer into the file.
        flushBuffer();
    }
}

void flushBuffer() {
    if (kiwiPtr) {
        fwrite(kiwiBuf, kiwiPtr, 1, f);
        kiwiPtr = 0;
    }
}

You need now to flush the buffer before close:

void save_txt(const graph_t * const graph, const char *fname)
{
    int i, count = graph->num_edges;
    f = fopen(fname, "w");
    if (NULL == f) {
        fprintf(stderr, "Error opening %s\n", fname);
        exit(-1);
    }
    for (i = 0; i < count; i++) {
        my_put_nbr(graph->edges[i].from);
        my_putchar(' ');
        my_put_nbr(graph->edges[i].to);
        my_putchar(' ');
        my_put_nbr(graph->edges[i].cost);
        my_putchar('\n');
    }
    flushBuffer();
    fclose(f);
}

UPDATE

By declaring the my_putchar function as inline and with a 4K buffer, the above code (modified with a mock of graph reading from an array of random integers) is around 6x faster than fprintf on

Linux mintaka 4.12.8-1-default #1 SMP PREEMPT Thu Aug 17 05:30:12 UTC 2017 (4d7933a) x86_64 x86_64 x86_64 GNU/Linux
gcc version 7.1.1 20170629 [gcc-7-branch revision 249772] (SUSE Linux)

About 2x of that seems to come from buffering. Andrew Henle made me notice an error in my code: I was comparing results to a baseline of unbuffered output, but fopen uses by default a BUFSIZ value and on my system BUFSIZ is 8192. So basically I've "discovered" just that:

  • there is no advantage on a 8K buffer, 4K is enough
  • my original suggestion of using _IOFBF is utterly worthless as the system already does it for you. This in turn means that Kiwi's answer is the most correct, for - as Andrew pointed out - avoids printf's checks and conversions.

Also, the overall increase (google Amdahl's Law) depends on what fraction of processing time goes into saving. Clearly if one hour of elaboration requires one second of saving, doubling saving speed saves you half a second; while increasing elaboration speed by 1% saves you 36 seconds, or 72 times more.

My own sample code was designed to be completely save-oriented with very large graphs; in this situation, any small improvement in writing speed reaps potentially huge rewards, which might be unrealistic in the real-world case.

Also (in answer to a comment), while using a small enough buffer will slow saving, it is not at all certain that using a larger buffer will benefit. Say that the whole graph generates in its entirety 1.2Kb of output; then of course any buffer value above 1.2Kb will yield no improvements. Actually, allocating more memory might negatively impact performances.

5
  • I think that it could work, but I don't really understand how to code it. Thanks, I will try to use it.
    – wutwut
    Commented Dec 23, 2017 at 13:34
  • You just add the setvbuf line, and place the BUFFERSIZE definition at the top of the file (and the buf allocation inside the writing function).
    – LSerni
    Commented Dec 23, 2017 at 13:37
  • buffering helps speed things up by reducing the number of 'actual' I/O operations, so shrinking the buffer is not a good idea. Commented Dec 23, 2017 at 17:06
  • 1
    About 2x of that seems to come from buffering (I would have expected more) I'm surprised you got any speedup from a 4K fprintf() buffer. I would expect the default buffer to be either 4K or 8K. In my experience, the default buffer size for the stdio functions usually provides about 80-90% of the maximum possible performance on most disk systems, and the only way to significantly improve the actual IO performance is to tune code for a specific filesystem and hardware configuration. I've found the performance gains are in avoiding printf() conversions. On slow disks, that won't matter. Commented Dec 24, 2017 at 15:03
  • YES. @AndrewHenle There was a slip in my comparation code, and I was using the wrong baseline. All in all I probably should retract this answer, as it does not add anything relevant and significant to Kiwi's, except confirming that it works.
    – LSerni
    Commented Dec 24, 2017 at 15:16
2

I would write a small function say print_graph(int int int) and call write directly in it

or something like this with my_putchar being a write call

int     my_put_nbr(int nb)
{
if (nb < 0)
{
    my_putchar('-');
    nb = -nb;
}
if (nb <= 9)
    my_putchar(nb + 48);
else
{
    my_put_nbr(nb / 10);
    my_put_nbr(nb % 10);
}
return (0);
}
1
  • 1
    Upvoting a clever little function; I'm not too sure it's really faster than printf, but it's worth a shot.
    – LSerni
    Commented Dec 23, 2017 at 13:52
1

I had to be 1.3x faster than fprintf, here is code that worked for me. I have to say that I had to submit it multiple times, sometimes I passed only 1 out of 5 tests with the same code. In conclusion, it is faster than fprintf but not reliably 1.3times faster..

void save_txt(const graph_t * const graph, const char *fname)
    {
        int count = graph->num_edges, i = 0;
        char c = '\n';
        char d = ' ';
        char buffer[15];
        FILE *f = fopen(fname, "w");
        while (count > 0) {
            itoa(graph->edges[i].from,buffer,10);
            fputs(buffer, f);
            putc(d, f);
            itoa(graph->edges[i].to,buffer,10);
            fputs(buffer, f);
            putc(d, f);
            itoa(graph->edges[i].cost,buffer,10);
            fputs(buffer, f);
            putc(c, f);
            i++;
            count -= 1;
        }
        if (f) {
            fclose(f);
        } 
    }
2
  • if this statement: FILE *f = fopen(fname, "w"); fails, then the following code will seg fault. Suggest following that statement with: if( !f ) { perror( "fopen failed" ); exit( EXIT_FAILURE ); } Commented Dec 23, 2017 at 17:02
  • suggest using sprintf() to put all of a line of output into a single buffer, then call fputs() just once and not call putc() at all. Commented Dec 23, 2017 at 17:04

Not the answer you're looking for? Browse other questions tagged or ask your own question.