10

I'm trying to generate large files (4-8 GB) with C code. Now I use fopen() with 'wb' parameters to open file binary and fwrite() function in for loop to write bytes to file. I'm writing one byte in every loop iteration. There is no problem until the file is larger or equal to 4294967296 bytes (4096 MB). It looks like some memory limit in 32-bit OS, because when it writes to that opened file, it is still in RAM. Am I right? The symptom is that the created file has smaller size than I want. The difference is 4096 MB, e.g. when I want 6000 MB file, it creates 6000 MB - 4096 MB = 1904 MB file.

Could you suggest other way to do that task?

Regards :)

Part of code:

unsigned long long int number_of_data = (unsigned int)atoi(argv[1])*1024*1024; //MB
char x[1]={atoi(argv[2])};

fp=fopen(strcat(argv[3],".bin"),"wb");

    for(i=0;i<number_of_data;i++) {
        fwrite(x, sizeof(x[0]), sizeof(x[0]), fp);
    }

fclose(fp);
22
  • 3
    strcat(argv[3],".bin") wrong
    – BLUEPIXY
    Commented May 13, 2013 at 10:29
  • 3
    Why the downvotes? This is an excellent question. Commented May 13, 2013 at 10:31
  • 2
    there is no guarantee that regions that can bind to ". bin" are prepared. It might have destroyed the program.
    – BLUEPIXY
    Commented May 13, 2013 at 10:33
  • 1
    which os and which file system do you use? Commented May 13, 2013 at 10:34
  • 1
    @RaphaelAhrens I use Windows 7 32bit and NTFS partition.
    – bLAZ
    Commented May 13, 2013 at 10:35

3 Answers 3

2

fwrite is not the problem here. The problem is the value you are calculating for number_of_data.

You need to be careful of any unintentional 32-bit casting when dealing with 64-bit integers. When I define them, I normally do it in a number of discrete steps, being careful at each step:

unsigned long long int number_of_data = atoi(argv[1]); // Should be good for up to 2,147,483,647 MB (2TB)
number_of_data *= 1024*1024; // Convert to MB

The assignment operator (*=) will be acting on the l-value (the unsigned long long int), so you can trust it to be acting on a 64-bit value.

This may look unoptimised, but a decent compiler will remove any unnecessary steps.

1
  • This solution helped me :) That was indeed problem with that variable value. But many thanks to all people answering in this question. It was very useful and informative.
    – bLAZ
    Commented May 13, 2013 at 18:50
2

You should not have any problem creating large files on Windows but I have noticed that if you use a 32 bit version of seek on the file it then seems to decide it is a 32 bit file and thus cannot be larger that 4GB. I have had success using _open, _lseeki64 and _write when working with >4GB files on Windows. For instance:

static void
create_file_simple(const TCHAR *filename, __int64 size)
{
    int omode = _O_WRONLY | _O_CREAT | _O_TRUNC;
    int fd = _topen(filename, omode, _S_IREAD | _S_IWRITE);
    _lseeki64(fd, size, SEEK_SET);
    _write(fd, "ABCD", 4);
    _close(fd);
}

The above will create a file over 4GB without issue. However, it can be slow as when you call _write() there the file system has to actually allocate the disk blocks for you. You may find it faster to create a sparse file if you have to fill it up randomly. If you will fill the file sequentially from the beginning then the above code will be fine. Note that if you really want to use the buffered IO provided by fwrite you can obtain a FILE* from a C library file descriptor using fdopen().

(In case anyone is wondering, the TCHAR, _topen and underscore prefixes are all MSVC++ quirks).

UPDATE

The original question is using sequential output for N bytes of value V. So a simple program that should actually produce the file desired is:

#include <stdlib.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <fcntl.h>
#include <io.h>
#include <tchar.h>
int
_tmain(int argc, TCHAR *argv[])
{
    __int64 n = 0, r = 0, size = 0x100000000LL; /* 4GB */
    char v = 'A';
    int fd = _topen(argv[1], _O_WRONLY | _O_CREAT| _O_TRUNC, _S_IREAD | _S_IWRITE);
    while (r != -1 && n < count) {
        r = _write(fd, &v, sizeof(value));
        if (r >= 0) n += r;
    }
    _close(fd);
    return 0;
}

However, this will be really slow as we are only writing one byte at a time. That is something that can be improved by using a larger buffer or using buffered I/O by calling fdopen on the descriptor (fd) and switching to fwrite.

6
  • Your MSVC++ quirks work fine under MinGW(-w64) GCC as well; they are quirks of the Windows C Runtime library, not the compiler.
    – rubenvb
    Commented May 13, 2013 at 11:37
  • Probably that is what I'm looking for, but I'm not far away from "Hello World" and now it is hard for me to use it in a way that I want :D Please give me a moment.
    – bLAZ
    Commented May 13, 2013 at 11:47
  • @patthoyts Could you tell me how to give filename to that function?
    – bLAZ
    Commented May 13, 2013 at 11:57
  • Something like this: TCHAR fname[] = "Name"; cannot be used... I have no idea how to run this function.
    – bLAZ
    Commented May 13, 2013 at 12:17
  • 1
    TCHAR is a MS defined type that compiles to either char or wchar_t depending on UNICODE being defined. It is endemic in Windows C/C++ code. You would use TCHAR fname[] = _T("Name"); to create an array of chars appropriate to the current compilation environment. Basically on Windows98 and older, TCHAR was char but on NT, XP and newer TCHAR is wchar_t (unicode capable).
    – patthoyts
    Commented May 13, 2013 at 13:48
1

Yuo have no problem with fwrite(). The problem seems to be your

unsigned long long int number_of_data = (unsigned int)atoi(argv[1])*1024*1024; //MB

which indeed should be rather something like

uint16_t number_of_data = atoll(argv[1])*1024ULL*1024ULL;

unsigned long long would still be ok, but unsigned int * int * int will give you a unsinged int no matter how large your target variable is.

3
  • So this is why I get that overflow warning. But it tells me now that atoll is undefined :/ I have #include < stdlib.h >.
    – bLAZ
    Commented May 13, 2013 at 12:38
  • 1
    try #include <string.h> and use _strtoui64 (or _tcstoui64 if using TCHAR types).
    – patthoyts
    Commented May 13, 2013 at 13:50
  • I will try this solution tomorrow but it will probably help as like @Lee Netherton solution (the same problem issue). Thanks.
    – bLAZ
    Commented May 13, 2013 at 18:56

Not the answer you're looking for? Browse other questions tagged or ask your own question.