I have a lot of recovered files of which many are invalid even though they appear to be ok by name and extension. This is expected.
Now I to need filter out those which are probably ok. I see to options:
For example, power point files (*.pptx) are actually zip containers that start with PK in the first two bytes. So the command
head --byte 2 filename
outputs PK for most of the good files whereas the bad files don't start with PK.
Question 1: How can I combine head
with find
to list out the files that match?
Another approach is the file
command. It prints
Zip archive data, at least v2.0 to extract
for good power point files but simply
data
for bad files.
Question 2: How can I combine file
with find
to list out valid files?
There are also other file type but I can augment the technque if I only get the clue :)
Question 3: Are the more obvious ways to do this?