4

I am trying to compare two binary files in order to identify one of them.

The first file I have contains the data I am interested in and can use to identify the second. The second file is from a 3rd party which could contain the information (or very similar) from the first file.

The two files can be different sizes (e.g. the first file might be 500KB while the second 4MB). Therefore I have been trying to score how much of the first file is in the second, so that I can say with some certainty it is related or derived from the same source (99% of file1 exists inside file2).

I have tried using cmp -l file1.bin file2.bin | wc -l but the problem with this is that the areas I am interested in are not aligned.

I have also tried using diff however it will always they they are different. If I could find the total different bytes I could take this away from the file size to see if the remainder matches my file.

Any help is much appreciated.

2
  • 3
    Have a look at the various diff tools for binaries, e.g. xdelta or bsdiff. Diffing binaries is hard, and the results may or may not be usable for you.
    – dirkt
    Commented Jun 14, 2019 at 14:16
  • cf. "binary plagiarism detection"
    – jhnc
    Commented Jun 14, 2019 at 22:28

1 Answer 1

6

For diffing binaries with the intent of counting the differences, you might use radiff2, which you could search for in your Linux repository (might be found in the package radare2).

radiff2 has the parameter -c to count binary differences. It is also able to compute the Levenshtein distance and the percentage of similarity between two files with the -s option:

$ radiff2 -s /bin/true /bin/false
similarity: 0.97
distance: 743

For more information on using radiff2 see the article Binary diffing.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .