0

I have a file structure that looks something like this:

.
└── base/
    ├── archives/
    │   ├── foo.zip
    │   ├── bar.rar
    │   └── baz.7z
    ├── unzipped-1/
    │   └── complex subdir tree/
    │       ├── selected files from archives
    │       └── files not from archives
    ├── unzipped-2/
    │   └── same deal as unzipped-1
    └── etc.

Over time, contents of archives/ will change, and which files I choose to extract from those archives will also change. In unzipped/ directories, about 95% of the files come from an archive. The rest were added by other means.

I want to create compressed backups of the whole directory tree, starting from base/. I want the file size of the compressed backup to be as small as possible. Time and memory requirements are not an issue for me.

Logically, it makes sense to use an archiving tool which is smart enough to detect that some files came from archives that are also being included in the current backup. There's no need to compress both the source archive and the extracted file: instead, this tool just needs to record where a file from the source archive should be extracted.

Clarification: during extraction, it will be necessary to not only extract the intact archives back into archives/, but it will also be necessary to extract selected files from each archive into unzipped/.

How can I do this?

2
  • Anecdotally, an example: total size of base/ is 20 GB, size of archives/ is 6 GB. Attempting to naively compress base/ with both 7z and FreeArc on "Ultra" settings resulted in archives around 13 GB, but given what I know of the makeup of unzipped/, with a tool that works as I described, I'd expect the resulting size to be around 8 GB. Commented Dec 2, 2023 at 18:43
  • Another typical day at StackExchange. Don't know why I even bothered to try. Commented Dec 11, 2023 at 3:10

1 Answer 1

1

You would need to create a list of the files from the zipped archives, and use that as an exclusion list when zipping the base folder.

Assuming 7Zip as the tool, and assuming that file-names do not repeat in different folders/archives but with different contents:

2
  • Thank you, this answers a part of the question, but not all of it. For example, during compression, we'll lose information about where each file from each archive in archives/ should be extracted into the unzipped/ tree. Additionally, this answer does not address the extraction mechanism, i.e. selectively extracting subarchives. I updated my question to clarify that requirement. I'm looking for an existing solution to the full question. I recognize that it may not exist. Commented Dec 2, 2023 at 20:39
  • "I'm looking for an existing solution...": questions seeking recommendations for software products are off topic on this site. This site is about helping people solve specific problems that are NOT requests for software solutions. I'm guessing that an on-topic version of this question would consist of you attempting to script or otherwise construct a solution yourself, coming upon a portion of this you are unable to solve, and presenting that. Commented Dec 5, 2023 at 17:44

Not the answer you're looking for? Browse other questions tagged .