Skip to main content
Fixed formatting; expanded option information.
Source Link

cmpcmp does things byte-by-byte, so it probably won't run out of memory (just tested it on two 7 GB files) -- but you might be looking for more detail than a list of "files XX and YY differ at byte xx, line y"y". If the similarities of your files are offset (e.g., file YY has an identical block of text, but not at the same location), you can pass offsets to cmp;cmp; you could probably turn it into a resynchronizing compare with a small script.

Aside: In case anyone else lands here when looking for a way to confirm that two directory structures (containing very large files) are identical:

diff diff --recursive --brief -r(or --briefdiff -r -q for short, or maybe even diff -rq) will work and not run out of memory.

cmp does things byte-by-byte, so it probably won't run out of memory (just tested it on two 7 GB files) -- but you might be looking for more detail than a list of "files X and Y differ at byte x, line y". If the similarities of your files are offset (e.g. file Y has an identical block of text, but not at the same location), you can pass offsets to cmp; you could probably turn it into a resynchronizing compare with a small script.

Aside: In case anyone else lands here when looking for a way to confirm that two directory structures (containing very large files) are identical:

diff -r --brief will work and not run out of memory.

cmp does things byte-by-byte, so it probably won't run out of memory (just tested it on two 7 GB files) -- but you might be looking for more detail than a list of "files X and Y differ at byte x, line y". If the similarities of your files are offset (e.g., file Y has an identical block of text, but not at the same location), you can pass offsets to cmp; you could probably turn it into a resynchronizing compare with a small script.

Aside: In case anyone else lands here when looking for a way to confirm that two directory structures (containing very large files) are identical: diff --recursive --brief (or diff -r -q for short, or maybe even diff -rq) will work and not run out of memory.

Source Link
Felix
  • 596
  • 4
  • 6

cmp does things byte-by-byte, so it probably won't run out of memory (just tested it on two 7 GB files) -- but you might be looking for more detail than a list of "files X and Y differ at byte x, line y". If the similarities of your files are offset (e.g. file Y has an identical block of text, but not at the same location), you can pass offsets to cmp; you could probably turn it into a resynchronizing compare with a small script.

Aside: In case anyone else lands here when looking for a way to confirm that two directory structures (containing very large files) are identical:

diff -r --brief will work and not run out of memory.