0

I need a utility to find RAR files that contain duplicate data (i.e. files within the RAR that hash the same, but could have different names).

I can open the RARs and see the CRCs are the same, but I was hoping for a more automated process that would work in bulk (hundreds of files).

Hashing the overall RAR won't help because the file contained within could have different names, or the archive could be compressed at different levels.

If needed, a utility that would extract the contents of the RARs and then compare would work, but is not preferred.

I would prefer a free utility for Windows, but a pay utility or a utility for Linux would be acceptable.

1 Answer 1

1

You could probably get all this to work in one step, but it might be easier to do something like this (linux):

for i in *.rar ; do  unrar l $i | tail -n+8 | head -n-3 | awk -v val=$i '{ printf("\"%s\" \"%s\" \"%s\"\n",val,$1,$8)}' >> rarfiles; done

This will go through all files in the current directory, and output all files in rars into a file called rarfiles, looking like this:

"rar name" "filename" "crc" 

The "head" and "tail" commmands just strip the header and footer off the unrar output. Then awk extracts the first and eigth field, $i (the rar filename) is passed through as a parameter via -v so we can print it with the output.

Then

 cat rarfiles | sort -k3,3 | uniq -D -f2

This will display the dupes. The sort parameters are saying the third field (crc) should be sorted on, and uniq -D says display only duplicate lines, ignoring the first two fields (so that skips "rar name" and "filename" so only shows dupes that have the same CRC.

1
  • The command lines is great, except for one thing. If the RAR file or the files within the RAR have spaces in their names the command had a problem. Do you know how to work around this? Thanks again. Commented Nov 22, 2011 at 6:09

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .