5

For regular files I can use the comm command to find common lines.

For example we have two files

$ cat f1
line1
line2
line3
line4
line5

$ cat f2
line1
line20
line30
line4
line5

Its compared like:

$ comm -12 f1 f2
line1
line4
line5

How to find the offset of matching lines and also how to do comparison for two binary files and print matching line offset?

I've been using things like diff, cmp, comm for past 1hr, unable to figure this out.

EDIT 1: Not exact solution but found vbindiff it helps a bit.

4
  • 4
    If it's a binary file, it doesn't have lines.
    – ganbustein
    Commented Jan 7, 2015 at 10:04
  • okay, right, but how to figure out offset of first common 80chars within these files. Commented Jan 7, 2015 at 10:25
  • 2
    I understand your question but not your problem. What do you really want to achieve? Which problem you want to solve?
    – Klaus
    Commented Jan 7, 2015 at 10:33
  • I have two files which are binary dump (unknown format). Say file1 has content "abcde" and file2 has "defgh" . I need to figure out a way to merge these two files by removing the common pattern. in this case its "de". output will be "abcdefgh" Commented Jan 7, 2015 at 10:41

1 Answer 1

9

You are probably looking for cmp:

cmp - compare two files byte by byte

$ cmp f1 f2
f1 f2 differ: byte 12, line 2

$ cmp -b f1 f2
f1 f2 differ: byte 12, line 2 is  12 ^J  60 0

$ cmp -bl f1 f2
12  12 ^J    60 0
13 154 l     12 ^J
14 151 i    154 l
15 156 n    151 i
16 145 e    156 n
17  63 3    145 e
18  12 ^J    63 3
19 154 l     60 0
20 151 i     12 ^J
21 156 n    154 l
22 145 e    151 i
23  64 4    156 n
24  12 ^J   145 e
25 154 l     64 4
26 151 i     12 ^J
27 156 n    154 l
28 145 e    151 i
29  65 5    156 n
30  12 ^J   145 e
cmp: EOF on f1

From man cmp:

-b, --print-bytes

print differing bytes

-l, --verbose

output byte numbers and differing byte values

4
  • thanks,Does cmp has option which will print matching line/byte offset? instead of telling about differ byte Commented Jan 7, 2015 at 10:26
  • 1
    As they are treated as binary, there is no such concept of lines. I am not very sure about how you expect the output to look like, could you clarify?
    – fedorqui
    Commented Jan 7, 2015 at 10:30
  • I would like to figure out which offset they have common pattern. For example something like : cmp f1 f2 f1 f2 same at byte byte 120, line 12 . Thanks for the help, for my case, I found "vbindiff". Commented Jan 7, 2015 at 10:37
  • I insist: there is no such thing like line in binary files. As you can see, cmp shows a message like "f1 f2 differ: byte 12, line 2 is 12 ^J 60 0"
    – fedorqui
    Commented Jan 7, 2015 at 11:07

Not the answer you're looking for? Browse other questions tagged or ask your own question.