A colleague (anonymized to protect the innocent, the guilty and wherever in this range accidental demon summoners fall) followed a tutorial that included combining a handful of gzipped files with
zcat *gz | pigz --fast -c -p 16 > outfile.gz
The files in question thus were all in the same directory, which was on an NTFS-formatted network share (accessible from both Linux and Windows machines).
He started the process on an Ubuntu machine, went for lunch, and came back to an implausibly large monster of a file and the process still running. He killed the process, deleted the file in the file explorer on his Windows machine (or so he thought), and asked me to help troubleshooting. When we combined the files more sensibly (cat *gz > new_outfile.gz
), we noticed that cat
complained about outfile.gz not existing.
Well, we'd just deleted it, so it shouldn't, but ls
on the Ubuntu machine and a refresh of the file explorer on the Windows one revealed it was back.
I got curious and tried to see what was going on.
file outfile.gz
told me this was a "writeable, regular file; no read access".
ls -l
in the directory showed the file permissions as -rw-rw-rw-
.
Trying to look at the start of the file with zcat outfile.gz | head
gave me gzip: outfile.gz: no such file or directory
.
After some more unsuccessful poking, I decided to just try and delete the file in the terminal (sudo rm outfile.gz
since deleting as a regular user on Windows didn't work and I was hoping this'd make it stick).
And was met with rm: outfile.gz: no such file or directory
.
I can exclude hidden characters that didn't get tab-completed (as suggested for another zombie file mystery) - ls -b
shows the filename without any escape sequences. The Windows and Ubuntu machines mostly agree that the file is there, except for when I actually want to do something with it.
Having looked at the files, there's what looks to be the result of another attempt, and it behaves the same way.
What exactly happened here? Did we manage to summon Filethulhu with what just looked like a less efficient way to combine files? (Another colleague apparently managed to combine the files without a hitch, but had separate directories for input and output.) And how exactly do we get rid of this 70+ GB eldritch abomination sitting in our share?
foo.test
does not exist thenecho *.test 3>foo.test
will not showfoo.test
because*.test
is expanded before3>foo.test
creates a new file. Butzcat *gz | pigz --fast -c -p 16 > outfile.gz
may readoutfile.gz
because the two parts of the pipeline run in parallel and the redirection in the second part may happen before*gz
is expanded. Tryrm foo.test; echo *.test >/dev/tty | : >foo.test
many times. Sometimes you getfoo.test
printed, sometimes not.outfile.gz
gets created once, and then thezcat
/pigz
pipeline keeps reading that file and feeding it into itself?outfile.gz
grows beforezcat
starts reading it. Thenzcat
reads the file from the beginning, butpigz
writes to the end. The reading process is always behind the writing one, so the file grows and grows.