0

I have many hundred zip files and want to find particular files in them. It is fairly easy the grep for filenames:

 find . -name "*.zip" -exec unzip -Z -1 {} \; | grep png

which would give all filenames inside the zip files. E.g.

icons/full/obj16/folder.png
icons/full/obj16/folderType_filter.png
icons/full/wizban/newfolder_wiz.png

But how can I prepend each line with the name of the zip file so I can actually find it? Something like so:

dir1/a.zip:icons/full/obj16/folder.png
dir2/icons.zip:icons/full/obj16/folderType_filter.png
myicons.zip:icons/full/wizban/newfolder_wiz.png
0

1 Answer 1

2

Run this:

gexp='\.png$' aexp='{print f":"$0}' find . -type f -name "*.zip" -exec sh -c '
   for f do
      unzip -Z -1 "$f" | grep -i "$gexp" | awk -F "" -v "f=$f" "$aexp"
   done
' find-sh {} +

Explanation:

  • I use -type f in case some directory (or other file that is not a regular file) accidentally matches -name "*.zip". I don't want to try to unzip a directory.
  • I use a shell invoked from find -exec to run a custom pipeline for each file preselected by find.
  • I use -exec … {} + to spawn far less shells than one per file. Now a single shell can get multiple paths as command line arguments, I loop over them with for.
  • I pass static expressions I want to use with grep and awk in the environment. If I embedded them in the shell code, then I would need to obfuscate the command with additional escaping and/or quoting. Passing expressions separately is much cleaner. I could pass them as command line arguments to sh, but then I would need to save them in variables of the inner shell and shift properly before I could loop over actual paths (still better than quoting frenzy IMO).
  • find-sh is explained here: What is the second sh in sh -c 'some shell code' sh?
  • I use case-insensitive grep -i.
  • Your pattern for grep was just png, it could match e.g. stopngo. I anchored it at line end and added a leading dot. The dot needs to be escaped because unescaped . matches any character in regex.
  • awk adds the path of the currently processed file (and :) to the beginning of each line piped from grep. One may think sed "s|^|$f:|" where $f is expanded by the shell would work; but then the expansion of $f might break the expression (imagine it contained |, code injection is possible). If I let the shell expand $f in the awk code, it would be similarly flawed. With awk -v "f=$f" I store the path as f variable in awk (note awk variables and shell variables are independent concepts). Now f in awk cannot break the code because awk knows it's not code. Additionally the entire expression designed for awk is static, so I can pass it in the environment in the first place.
1
  • Worked as advertised!
    – thoni56
    Commented Nov 12, 2020 at 20:24

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .