2

I ran the following bash command to merge several big files

cat file1.txt file2.txt file3.txt file4.txt > merged.txt

The process is very long as the files make about 12 GB each. On the activity monitor (Mac OSX 10.11.3), under the tab called Disk, I see the following entry for the process of interest

enter image description here

How is it possible in this cat process that more bytes are being written than read?

5
  • I don't know OSX, but maybe the monitor total includes disc access while swapping. If an OS swaps a program to disc, but it becomes active again before anything else uses its RAM, then the memory copy will be used to resume it, without reading in again.
    – AFH
    Commented Feb 23, 2017 at 23:59
  • Not knowing OSX either, my speculation is that the "Written" data includes the inodes updated as the file grows to occupy them. That wouldn't be included in the "Read" data because the meta data of all but the first inode is immaterial to cat. Either it can access the file or not, and the timestamps, etc., are ignored by cat on the input files. The output file, however, that the shell creates has to have the file meta data reproduced in each inode that is used for the new file.
    – Chindraba
    Commented Feb 24, 2017 at 1:32
  • 1
    @GypsySpellweaver - Good point. There may be an access time in whichever file system is installed, so strictly there are possibly time-stamps updated on reads, though only a couple of times per file! But you have made me think of something else: if the target is on a journalling file system, then there will be journal writes as well as the data. I'm surprised that all of this would add up to the 10% overhead in the question. Maybe it's just differences in the two partitions, cluster size and fragmentation in particular.
    – AFH
    Commented Feb 24, 2017 at 13:10
  • Neither do I know OSX. I've read that HFS+ doesn't support sparse files; but in case of another filesystem I would speculate at least one input file is (partially) sparse, the output file is not. When I cat a fully sparse file to another file in Linux (BTRFS filesystem) iotop indicates the process reads very little and writes a lot. A sparse text file is uncommon though, so this is just a very general remark of mine. Commented Dec 10, 2017 at 19:52
  • This question should be retitled to ask about activity monitor. I don't see any data suggesting more was written than was read - only that activity monitor reported as such. Count the bytes in the resulting file and likely it is the sum of the input files.
    – sage
    Commented Dec 27, 2017 at 18:06

0

You must log in to answer this question.

Browse other questions tagged .