1

I'm writing an scripts which relies on git diff output and the line numbers displayed there. However, I faced a case where git mysteriously showing the output in a wrong format imho.

The problematic case (GitHub diff on web)

The issue in question is at this chunk (GitHub diff) (see image below):

enter image description here

You can also see the code block at parent commit and the actual commit.

The problematic case (git diff)

However performing the git diff --word-diff <full list of options below> gives you the following output (I just show the chunk in question):

@@ -77,2 +80,4 @@ public class CofVGenRoeMetz {
                        phi[i] = [-Gaussian.Phi((u[0]-]{+gauss.cumulativeProbability((u[0]+} + x[i])
                                        / Math.sqrt(scale20))
                                        * [-Gaussian.Phi((u[1]-]{+gauss.cumulativeProbability((u[1]+} + x[i])
                                                        / Math.sqrt(scale21));

Problem

Based on the git diff, one can interpret the / Math.sqrt(scale20)) belongs to line 78(before)/81(after) which is only true for the "after" scenario. Why do I get the wrong output?

In another word, from git output, you will learn the / Math.sqrt(scale20)) belonges to line 78 before this commit happens (this is based on git diff syntax). However, in reality, if you look at the file in parent commit, / Math.sqrt(scale20)) belongs to line 77.



PS: I'm using MacOS Monterey

The full git diff command I used: git diff --no-color --unified=0 --word-diff --ignore-submodules --ignore-all-space 3424dae cfdd0dfd1b7724efbc786d0cfc070dc0696435b4 -- imrmc/mrmc_source/src/simroemetz/core/CofVGenRoeMetz.java

3
  • 1
    I believe it is because you're using word-diff, it doesn't always make sense when you break apart lines, has been my experience. It would have to show that part twice, once as being removed on the first line and once as being added as a separate line, but since it wasn't actually changed, just a linefeed being inserted, it wasn't highlighted as different. I've seen similar occurrences myself. Commented Jan 3, 2022 at 12:25
  • To expand on @LasseV.Karlsen's comment a bit: git diff --word-diff first makes a standard (line by line) diff. Then Git applies the word diff regex to each word in the various changed lines and rewrites the output-to-be-displayed with the word-change markers. Depending on both the regexp and the options like --ignore-all-space, this can sometimes remove the entire diff, so that you get a diff hunk with no change in it, or otherwise damage the diff hunk in ways that will make some sort of sense to a human but be useless for machine replication. Don't trust the word diff.
    – torek
    Commented Jan 3, 2022 at 20:31
  • I think that whitespace-ignoring options are applied to the line-by-line standard diff by replacing sequences of whitespace with a single whitespace character and/or deleting trailing whitespace (but newlines are retained because they break up the lines!). The word-diff algorithm then operates on the original unedited lines, which makes for even more confusion.
    – torek
    Commented Jan 3, 2022 at 20:35

1 Answer 1

1

Try setting --word-diff-regex, to something like : --word-diff-regex="[^ \t]*"

The default regexp ([^[:space:]]* if I'm correct) completely ignores \n in the diff computation.


Quoting git help diff (emphasis mine) :

--word-diff-regex=<regex>

[...]

Every non-overlapping match of the <regex> is considered a word. Anything between these matches is considered whitespace and ignored(!) for the purposes of finding differences.

You may want to append |[^[:space:]] to your regular expression to make sure that it matches all non-whitespace characters. A match that contains a newline is silently truncated(!) at the newline.

Not the answer you're looking for? Browse other questions tagged or ask your own question.