I have a series of gzip
files which I wish to store more efficiently using xz
, without losing traceability to a set of checksums of the gzip files.
I believe this amounts to being able to recreate the gzip files from the xz files, though I'm open to other suggestions.
To elaborate... If I have a gzip file named target.txt.gz
, and I decompress it to target.txt
and discard the compressed file, I want to exactly recreate the original compressed file target.txt.gz
. By exactly, I mean a cryptographic checksum of the file should indicate that it is exactly the same as the original.
I initially thought this must be impossible, because a gzip file contains metadata such as original file name and timestamp, which might not be preserved upon decompression, and metadata such as a comment, the source operating system, and compression flags, which are almost certainly not preserved upon decompression.
But then I thought to modify my question: is there a minimal amount of header information that I could extract from the gzip file that, in combination with the uncompressed data, would allow me to recreate the original gzip file.
And then I thought that the answer might still be no due to the existence of tools such as Zopfli and 7-zip, which can create gzip-compatible streams which are better (therefore different) from the standard gzip program. As far as I am aware, the gzip file format does not record which of these compressors created it.
So my question becomes: are there other options I haven't thought of that might mean I can achieve my goal as set out in the first paragraph after all?