37

I would love some help with a Bash script loop that will show all the differences between two binary files, using just

cmp file1 file2 

It only shows the first change I would like to use cmp because it gives a offset an a line number of where each change is but if you think there's a better command I'm open to it :) thanks

2
  • The offset is valid, but the line number will not be valid when comparing binary files, as they have no concept of lines (only text have lines). Commented Dec 5, 2011 at 13:01
  • Yeah I understand, in this case I use the line number to reference to a hexdump of the binary so I read whats around the different offset :) Commented Dec 5, 2011 at 13:11

3 Answers 3

46

I think cmp -l file1 file2 might do what you want. From the manpage:

-l  --verbose
      Output byte numbers and values of all differing bytes.

The output is a table of the offset, the byte value in file1 and the value in file2 for all differing bytes. It looks like this:

4531  66  63
4532  63  65
4533  64  67
4580  72  40
4581  40  55
[...]

So the first difference is at offset 4531, where file1's decimal octal byte value is 66 and file2's is 63.

3
  • 4
    +1: this is 'the way to do it', but the problem with it is that cmp does not look for inserted or deleted material; it just checks 'if the byte at offset N in file1 the same as the byte at offset N in file2; if yes, then print nothing, else print difference'. So the files have to be very similar (eg, just some bytes in the Unix timestamp when the object files were compiled - which is built into some object files) but the rest needs to be the same. Add 3 bytes to a constant string and everything after that is different. Commented Dec 5, 2011 at 15:39
  • Thanks heaps this is just what I wanted, i try that in the past but I did know the the numbers on the side where the offsets :) Thanks heaps! Commented Dec 5, 2011 at 20:14
  • 2
    I've edited the answer by add a correction about format of the bytes that differ. This is a not so well documented feature of cmp. I hope that the edit is appropriate.
    – fdermishin
    Commented Feb 7, 2021 at 21:52
6

Method that works for single byte addition/deletion

diff <(od -An -tx1 -w1 -v file1) \
     <(od -An -tx1 -w1 -v file2)

Generate a test case with a single removal of byte 64:

for i in `seq 128`; do printf "%02x" "$i"; done | xxd -r -p > file1
for i in `seq 128`; do if [ "$i" -ne 64 ]; then printf "%02x" $i; fi; done | xxd -r -p > file2

Output:

64d63
<  40

If you also want to see the ASCII version of the character:

bdiff() (
  f() (
    od -An -tx1c -w1 -v "$1" | paste -d '' - -
  )
  diff <(f "$1") <(f "$2")
)

bdiff file1 file2

Output:

64d63
<   40   @

Tested on Ubuntu 16.04.

I prefer od over xxd because:

  • it is POSIX, xxd is not (comes with Vim)
  • has the -An to remove the address column without awk.

Command explanation:

  • -An removes the address column. This is important otherwise all lines would differ after a byte addition / removal.
  • -w1 puts one byte per line, so that diff can consume it. It is crucial to have one byte per line, or else every line after a deletion would become out of phase and differ. Unfortunately, this is not POSIX, but present in GNU.
  • -tx1 is the representation you want, change to any possible value, as long as you keep 1 byte per line.
  • -v prevents asterisk repetition abbreviation * which might interfere with the diff
  • paste -d '' - - joins every two lines. We need it because the hex and ASCII go into separate adjacent lines. Taken from: Concatenating every other line with the next
  • we use parenthesis () to define bdiff instead of {} to limit the scope of the inner function f, see also: How to define a function inside another function in Bash?

See also:

4
  • This has the inherent flaw that it will not stream the data but load everything into RAM, meaning you will need at least 2 - 3 times the size of the files as memory, which most binary diff tools use. The only one I found that doesn't behave like this is xdelta3...
    – Izzy
    Commented Nov 9, 2017 at 11:58
  • @Izzy add it to an answer showing to use it and why and get upvotes :-) Commented Nov 9, 2017 at 12:03
  • Sadly, to my knowledge, it can't. At least not the kind you'd expect. It produces VCDIFF output, which is a highly compressed binary delta. So you can just diff, patch and few the command structure. My comment was more of a "be aware that this answer will blow your main memory with a 5GB file"
    – Izzy
    Commented Nov 14, 2017 at 8:48
  • @Izzy OK! Good to know nevertheless. Commented Nov 14, 2017 at 8:50
3

The more efficient workaround I've found is to translate binary files to some form of text using od.

Then any flavour of diff works fine.

1
  • Yep, it really depends on what the OP wants to do with the diff. A diff of a hexdump is probably of more value for humans, while a cmp may be easier for programs to parse/use.
    – rwos
    Commented Dec 5, 2011 at 16:03

Not the answer you're looking for? Browse other questions tagged or ask your own question.