File reads slow on first read, but fast on consecutive reads

Question

(This isn't my program, but I'll try to provide all the relevant information to the best of my knowledge.)

There is a program which reads binary files that are roughly 300MB in size, processes them and outputs some information. The program uses ifstream for file input and streams are correctly initialized and closed for each read.

The program has to read each file multiple times. Reading a file for the first time takes about 3 seconds, and each consecutive read takes about 0.1 seconds. If several files are processed, going back to the first file will still yield fast read speeds, but after some time re-reading a file becomes slow.

Additionally, if a file is copied to another location, the speed of the first read of the new file is roughly 0.1 seconds.

If you do the math, the speed of consecutive reads is roughly the advertised read speed of the hard drive.

All this looks like file locations are cached by either the OS or the hard drive, so that on consecutive reads you don't have to seek out file locations.

Does anyone know what exactly is causing the slowdown on the initial read, and if it can be prevented? Three seconds may not seem like a lot, but they add about 5 hours to the total time needed to correctly process every file.

Also, the program runs on Fedora 14 and Scientific Linux, with both OS's having their default file systems.

Any ideas would be appreciated.

3 seconds to read a 300MB file is about right for hitting the disk - that's 100MB/s, which is at the high end of the speed you can expect from a modern, fast hard disk. 0.1 seconds to read a 300MB file is not coming off a disk - that's coming out of a cache. — caf, Commented Nov 20, 2011 at 10:42

Adrian Cornish · Accepted Answer · 2011-11-19 03:56:19Z

2

Linux will try and copy the file into RAM to make the next read faster - I am guessing this is what is happening. The initial read is actual off disk - subsequent reads are out of the file cache because the entire file has been copied to RAM

answered Nov 19, 2011 at 3:56

Adrian Cornish

23.7k13 gold badges62 silver badges79 bronze badges

I monitored the RAM while the program was iterating through the files. Considering the original amount of free space, the size of the files, and the number of files it iterated through before the old ones were "forgotten", this seems to be the correct answer.
– Morglor
Commented Nov 22, 2011 at 5:21

Add a comment |

Brian Roach · Accepted Answer · 2011-11-19 03:56:05Z

1

The OS (Linux) has a disk cache. After you read the file once, it's in the cache.

answered Nov 19, 2011 at 3:56

Brian Roach

76.7k12 gold badges139 silver badges163 bronze badges

Add a comment |

willkara · Accepted Answer · 2011-11-19 03:56:45Z

0

My guess would be that maybe the first time it reads the file it takes longer because it loads some information into the cache?

After the first time, it just uses some of the information in the cache.

answered Nov 19, 2011 at 3:56

willkara

1773 gold badges7 silver badges16 bronze badges

Add a comment |

Basile Starynkevitch · Accepted Answer · 2011-11-19 20:10:41Z

0

Yes, the data becomes cached. You might force that caching with the readahead syscall (or simply by having another process read it). If using mmap you could also use madvise

answered Nov 19, 2011 at 20:10

Basile Starynkevitch

1

Add a comment |

Collectives™ on Stack Overflow

File reads slow on first read, but fast on consecutive reads

4 Answers 4

Not the answer you're looking for? Browse other questions tagged
c++
linux
fedora
ifstream
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

4 Answers 4

Not the answer you're looking for? Browse other questions tagged c++linuxfedoraifstream or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
c++
linux
fedora
ifstream
or ask your own question.