7

Let's say I have two files. The first one has the contents:

line 1
foo
line 2

line 1
bar
line 2

And the second one has a new section inserted in the middle, so it looks like this:

line 1
foo
line 2

line 1
new text
line 2

line 1
bar
line 2

Now, when I do a "diff -u", I get output like this:

--- file1   2013-06-25 16:27:43.170231844 -0500
+++ file2   2013-06-25 16:27:59.218757056 -0500
@@ -1,7 +1,11 @@
line 1
foo
line 2

line 1
+new text
+line 2
+
+line 1
bar
line 2

This doesn't properly reflect that the middle stanza was inserted -- instead, it makes it look like the second stanza was changed, and a new one added to the end (this is because the algorithm starts at the first differing line).

Is there any way to get diff (either by itself, or using git diff) to show this output instead?

--- file1   2013-06-25 16:27:43.170231844 -0500
+++ file2   2013-06-25 16:27:59.218757056 -0500
@@ -1,7 +1,11 @@
line 1
foo
line 2
+
+line 1
+new text
+line 2

line 1
bar
line 2

This is mostly an issue when generating a patch for someone to review, where a new function gets inserted into a group of similar functions. The default behavior doesn't reflect what really changed.

5
  • Try sdiff file1 file2 may be this is what you are looking for.
    – g4ur4v
    Commented Jun 25, 2013 at 21:58
  • @g4ur4v, not quite -- that still makes it look like part of section 2 was modified and part of section 3 added -- when in reality, a new section was inserted between the other two. Commented Jun 25, 2013 at 22:14
  • "new function gets inserted into a group of similar functions" is a bit of a code smell itself, except too, too common in some languages. Have you tried --unified 5 or larger values?
    – msw
    Commented Jun 26, 2013 at 0:58
  • @msw, I agree about the code smell in general -- I can't recall what this original case was. However my most recent case was when inserting records into an XML database export; in this case the new records will often be similar to the surrounding records (almost identical to the example I have above). As for adding a large number to the --unified flag, that just gives more context, but doesn't change where the "+" signs appear. Commented Jun 26, 2013 at 18:32
  • XML is grossly repetitive. I've not chased down any of the links but perhaps stackoverflow.com/questions/1871076/… might be useful. I was then thinking about the longest common sub-sequence algorithm and realized it, of necessity, would generate source-ignorant diffs. This turned up msdn.microsoft.com/en-us/library/aa302294.aspx which appears to operate at a semantic level.
    – msw
    Commented Jun 26, 2013 at 20:24

3 Answers 3

2

Git 2.9 was released earlier this year which included the experimental flag --compaction-heuristic on the git diff command:

In 2.9, Git's diff engine learned a new heuristic: it tries to keep hunk boundaries at blank lines, shifting the hunk "up" whenever the bottom of the hunk matches the bottom of the preceding context, until we hit a blank line.

I don't think GitHub has it enabled for diffs on the web UI for Pull Requests and comparisons, but you can do it locally. I'd recommend using it in conjunction with --word-diff if you need that level of granularity.

More details available on the GitHub blog: https://github.com/blog/2188-git-2-9-has-been-released

1
  • Doesn't look like that flag exists anymore, at least on git 2.20
    – user114651
    Commented Sep 19, 2019 at 17:51
1

The patience diff algorithm (git diff --patience) may give you more natural results, though not in all cases.

1
  • 1
    This still produced the same results in my example above. I know there is a a solution somewhere, as I remember reading about it a while ago, just can't remember. Commented Jun 26, 2013 at 18:34
0

In certain cases, the command git diff --word-diff ( or --color-words) may give you better looking results

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .