I have a file structure that looks something like this:
.
└── base/
├── archives/
│ ├── foo.zip
│ ├── bar.rar
│ └── baz.7z
├── unzipped-1/
│ └── complex subdir tree/
│ ├── selected files from archives
│ └── files not from archives
├── unzipped-2/
│ └── same deal as unzipped-1
└── etc.
Over time, contents of archives/
will change, and which files I choose to extract from those archives will also change. In unzipped/
directories, about 95% of the files come from an archive. The rest were added by other means.
I want to create compressed backups of the whole directory tree, starting from base/
. I want the file size of the compressed backup to be as small as possible. Time and memory requirements are not an issue for me.
Logically, it makes sense to use an archiving tool which is smart enough to detect that some files came from archives that are also being included in the current backup. There's no need to compress both the source archive and the extracted file: instead, this tool just needs to record where a file from the source archive should be extracted.
Clarification: during extraction, it will be necessary to not only extract the intact archives back into archives/
, but it will also be necessary to extract selected files from each archive into unzipped/
.
How can I do this?
base/
is 20 GB, size ofarchives/
is 6 GB. Attempting to naively compressbase/
with both 7z and FreeArc on "Ultra" settings resulted in archives around 13 GB, but given what I know of the makeup ofunzipped/
, with a tool that works as I described, I'd expect the resulting size to be around 8 GB.