1

I am trying to unzip a tar file on an external hard drive which is taking way to long.

The file is 100GB compressed and will be 500GB uncompressed.

I am using the following commands on windows command line:

wsl
cd f
tar -xzvf filename.tgz

It seems that there is some writing from the external drive to the main system hard drive and back.

This seems a waste of time. Is there any way to avoid that?

1
  • 1
    phdstudent, I would have mentioned the speed issue in the title as the current title may suggest that you are unable to use tar at all if one does not read your posting. I select the postings I read and answer by looking at the title...
    – r2d3
    Commented Oct 28, 2022 at 13:03

1 Answer 1

2

You do not say what "way too long" is... But anyway: writing 500 GB on an external HDD is supposed to take some time. Assuming your disk is connected with USB 3 and that it can sustain 120 MB/s for sequential writes (which is a good performance), it needs more than an hour to write 500 GB. If connected with USB 2, it would be about 5 hours.

But since the file you are decompressing is on the very same disk, the disk cannot continuously write the output files, it has also to read the input file. Not only it cannot read and write at the same time, but also the read/write heads have to continuously go back and forth between the input file and the output files: this is partly random access, and HDDs are very bad at that. The performances can severely drop in this case.

Moreover, if the decompressed data is made of tons of small files this is even worse, as there is an constant overhead cost for each file written.

Also, depending on the PC you are running on, the bottleneck could be the CPU if it is not powerful enough (this is a gzip'd archive, and the decompression doesn't come for free).

Last, you do not say which is the model of your drive, but many drives nowadays are SMR drives, that is "Shingle Magnetic Recording". The SMR technology allows higher recording density, hence large capacities, but at a price: write performances can be quite bad. To overcome the problem these drives contain a variable amount of write cache, with good write performances. But when writing a huge volume at once, once the cache is filled up then the performances drop again.

So, you see, there are many potential reasons why the decompression of your archive is not as fast as you would like. For most of them you cannot do a lot... But at least, you should put the archive file on a different drive (if possible on an internal drive) than the destination drive: it will suppress the random access problem. The time spent to copy the file to the internal drive will be likely regained during the decompression.

EDIT As suggested in the comments I am adding another potential reason: the possible fragmentation of the filesystem on the drive.

1
  • 1
    At 35 MB/sec I would rather estimate 4 hours when being connected over USB 2.0. I would have mentioned fragmentation on that external drive as an additional factor for slowdown. Apart from that I liked your answer stating "takes some time" as a response to "way too slow".
    – r2d3
    Commented Oct 28, 2022 at 13:01

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .