diff within a line

Question

I have some sql dumps that I am looking at the differences between. diff can obviously show me the difference between two lines, but I'm driving myself nuts trying to find which values in the long list of comma-separated values are actually the ones causing the lines to be different.

What tool can I use to point out the exact character differences between two lines in certain files?

superuser.com/questions/496415/… | stackoverflow.com/questions/1342256/… — Ciro Santilli OurBigBook.com, Commented May 22, 2017 at 15:44
git diff --word-diff --word-diff-regex=. f1 f2 works like a charm — Jean Monet, Commented Feb 19, 2021 at 13:46

alex · Accepted Answer · 2011-04-12 05:46:46Z

130

There's wdiff, the word-diff for that.

On desktop, meld can highlight the differences within a line for you.

answered Apr 12, 2011 at 5:46

alex

7,2936 gold badges29 silver badges30 bronze badges

14

Colored wdiff: wdiff -w "$(tput bold;tput setaf 1)" -x "$(tput sgr0)" -y "$(tput bold;tput setaf 2)" -z "$(tput sgr0)" file1 file2
– l0b0
Commented Apr 12, 2011 at 11:21
65

For color, install colordiff, then do: wdiff a b | colordiff
– philfreo
Commented Sep 28, 2013 at 3:36
Meld actually is extremely slow (minutes) at showing the intra-line differences between to line-based files.
– Dan Dascalescu
Commented Jun 28, 2017 at 22:02
3

There is also dwdiff tool which is mostly compatible with wdiff but also supports colored output and probably some other features. And it is more available in some Linux distributions like Arch.
– MarSoft
Commented Aug 31, 2017 at 0:13
5

wdiff -n a b | colordiff, advises man colordiff.
– Camille Goudeseune
Commented May 4, 2018 at 17:50

| Show 1 more comment

Deepak · Accepted Answer · 2016-12-16 10:24:24Z

74

Just another method using git-diff:

git diff -U0 --word-diff --no-index -- foo bar | grep -v ^@@

grep -v if not interested in positions of the diffs.

answered Dec 16, 2016 at 10:24

Deepak

8416 silver badges3 bronze badges

5

This is exactly the behaviour I was trying to mimic -- didn't realize I could use git-diff without one of the files being indexed.
– SpinUp __ A Davis
Commented Oct 10, 2017 at 0:27
4

--word-diff is the key option here. Thanks!
– user2707671
Commented Aug 17, 2018 at 13:04
2

--no-index is only required if you're in a git working directory and both foo and bar are as well.
– xn.
Commented Feb 12, 2019 at 21:00
4

The nice thing is that git diff is much more powerful than wdiff. You can use --word-diff-regex=. for character-wise diff, for example.
– Tgr
Commented Aug 6, 2021 at 3:50
4

One drawback is that git diff doesn't work with process substitution. wdiff <(cmd1) <(cmd2) works but git will just diff the pipe IDs.
– Tgr
Commented Aug 6, 2021 at 3:52

Add a comment |

kyb · Accepted Answer · 2021-08-09 00:26:51Z

33

I've used vimdiff for this.

Here's a screenshot (not mine) showing minor one or two character differences that stands out pretty well. A quick tutorial too.

edited Aug 9, 2021 at 0:26

kyb

4204 silver badges20 bronze badges

answered Apr 12, 2011 at 3:04

Mark McKinstry

15.6k4 gold badges35 silver badges28 bronze badges

1

In my case couldn't spot the difference so opened the files in gvim -d f1 f2 the particular long lines were both highlighted as being different however the actual difference was extra highlighted in red
– zzapper
Commented Dec 9, 2015 at 16:54
1

I've been using vim forever, but had no idea about vimdiff!
– mitchus
Commented Apr 3, 2017 at 16:02
And there is diffchar.vim for character-level diffs.
– user37050
Commented Nov 16, 2017 at 23:52
3

As much as I love vim and vimdiff, vimdiff's algorithm for highlighting differences in a line is pretty basic. It seems to just strip out the common prefix and suffix, and highlight everything between as different. This works if all of the characters that changed are grouped together, but if they are spread out it doesn't work well. It's also terrible for word-wrapped text.
– Laurence Gonsalves
Commented Dec 21, 2017 at 20:49
1

For long lines as in the OP, vimdiff -c 'set wrap' -c 'wincmd w' -c 'set wrap' a b, suggests stackoverflow.com/a/45333535/2097284.
– Camille Goudeseune
Commented May 4, 2018 at 17:54

Add a comment |

StackzOfZtuff · Accepted Answer · 2024-07-09 13:56:29Z

Here is a "..hair of the dog that bit you" method...
diff got you to this point; use it to take you further...

Here is the output from using the sample line pairs... ☻ indicates a TAB

Paris in the     spring 
Paris in the the spring 
             vvvv      ^

A ca t on a hot tin roof.
a cant on a hot  in roof 
║   v           ^       ^

the quikc brown box jupps ober the laze dogs 
The☻qui ckbrown fox jumps over the lazy dogs 
║  ║   ^ ║      ║     ║    ║          ║     ^

Here is the script.. You just need to ferret out the line pairs somehow.. (I've used diff only once (twice?) before today, so I don't know its many options, and sorting out the options for this script was enough for me, for one day :) .. I think it must be simple enough, but I'm due for a coffee break ....

#
# Name: hair-of-the-diff
# Note: This script hasn't been extensively tested, so beware the alpha bug :) 
#   
# Brief: Uses 'diff' to identify the differences between two lines of text
#        $1 is a filename of a file which contains line pairs to be processed
#
#        If $1 is null "", then the sample pairs are processed (see below: Paris in the spring 
#          
# ║ = changed character
# ^ = exists if first line, but not in second 
# v = exists if second line, but not in first
 
bname="$(basename "$0")"
workd="/tmp/$USER/$bname"; [[ ! -d "$workd" ]] && mkdir -p "$workd"

# Use $1 as the input file-name, else use this Test-data
# Note: this test loop expands \t \n etc ...(my editor auto converts \t to spaces) 
if [[ "$1" == '' ]] ;then
  ifile="$workd/ifile"
{ while IFS= read -r line ;do echo -e "$line" ;done <<EOF
Paris in the spring 
Paris in the the spring
A cat on a hot tin roof.
a cant on a hot in roof
the quikc brown box jupps ober the laze dogs 
The\tquickbrown fox jumps over the lazy dogs
EOF
} >"$ifile"
else
  ifile="$1"
fi
#
[[ -f "$ifile" ]] || { echo "ERROR: Input file NOT found:" ;echo "$ifile" ;exit 1 ; }
#  
# Check for balanced pairs of lines
ilct=$(<"$ifile" wc -l)
((ilct%2==0)) || { echo "ERROR: Uneven number of lines ($ilct) in the input." ;exit 2 ; }
#
ifs="$IFS" ;IFS=$'\n' ;set -f
ix=0 ;left=0 ;right=1
while IFS= read -r line ;do
  pair[ix]="$line" ;((ix++))
  if ((ix%2==0)) ;then
    # Change \x20 to \x02 to simplify parsing diff's output,
    #+   then change \x02 back to \x20 for the final output. 
    # Change \x09 to \x01 to simplify parsing diff's output, 
    #+   then change \x01 into ☻ U+263B (BLACK SMILING FACE) 
    #+   to the keep the final display columns in line. 
    #+   '☻' is hopefully unique and obvious enough (otherwise change it) 
    diff --text -yt -W 19  \
         <(echo "${pair[0]}" |sed -e "s/\x09/\x01/g" -e "s/\x20/\x02/g" -e "s/\(.\)/\1\n/g") \
         <(echo "${pair[1]}" |sed -e "s/\x09/\x01/g" -e "s/\x20/\x02/g" -e "s/\(.\)/\1\n/g") \
     |sed -e "s/\x01/☻/g" -e "s/\x02/ /g" \
     |sed -e "s/^\(.\) *\x3C$/\1 \x3C  /g" \
     |sed -n "s/\(.\) *\(.\) \(.\)$/\1\2\3/p" \
     >"$workd/out"
     # (gedit "$workd/out" &)
     <"$workd/out" sed -e "s/^\(.\)..$/\1/" |tr -d '\n' ;echo
     <"$workd/out" sed -e "s/^..\(.\)$/\1/" |tr -d '\n' ;echo
     <"$workd/out" sed -e "s/^.\(.\).$/\1/" -e "s/|/║/" -e "s/</^/" -e "s/>/v/" |tr -d '\n' ;echo
    echo
    ((ix=0))
  fi
done <"$ifile"
IFS="$ifs" ;set +f
exit
#

Hashbrown · Accepted Answer · 2017-11-20 05:49:44Z

Using @Peter.O's solution as a basis I rewrote it to make a number of changes.

It only prints every line once, using colour to show you the differences.
It doesn't write any temp files, piping everything instead.
You can provide two filenames and it'll compare the corresponding lines in each file. ./hairOfTheDiff.sh file1.txt file2.txt
Otherwise, if you use the original format (a single file with every second line needing to be compared to the one before) you may now simply pipe it in, no file needs to exist to be read. Take a look at demo in the source; this may open the door to fancy piping in order to not need files for two separate inputs too, using paste and multiple file-descriptors.

No highlight means the character was in both lines, highlight means it was in the first, and red means it was in the second.

The colours are changeable through variables at the top of the script and you can even forego colours entirely by using normal characters to express differences.

#!/bin/bash

same='-' #unchanged
up='△' #exists in first line, but not in second 
down='▽' #exists in second line, but not in first
reset=''

reset=$'\e[0m'
same=$reset
up=$reset$'\e[1m\e[7m'
down=$reset$'\e[1m\e[7m\e[31m'

timeout=1


if [[ "$1" != '' ]]
then
    paste -d'\n' "$1" "$2" | "$0"
    exit
fi

function demo {
    "$0" <<EOF
Paris in the spring 
Paris in the the spring
A cat on a hot tin roof.
a cant on a hot in roof
the quikc brown box jupps ober the laze dogs 
The quickbrown fox jumps over the lazy dogs
EOF
}

# Change \x20 to \x02 to simplify parsing diff's output,
#+   then change \x02 back to \x20 for the final output. 
# Change \x09 to \x01 to simplify parsing diff's output, 
#+   then change \x01 into → U+1F143 (Squared Latin Capital Letter T)
function input {
    sed \
        -e "s/\x09/\x01/g" \
        -e "s/\x20/\x02/g" \
        -e "s/\(.\)/\1\n/g"
}
function output {
    sed -n \
        -e "s/\x01/→/g" \
        -e "s/\x02/ /g" \
        -e "s/^\(.\) *\x3C$/\1 \x3C  /g" \
        -e "s/\(.\) *\(.\) \(.\)$/\1\2\3/p"
}

ifs="$IFS"
IFS=$'\n'
demo=true

while IFS= read -t "$timeout" -r a
do
    demo=false
    IFS= read -t "$timeout" -r b
    if [[ $? -ne 0 ]]
    then
        echo 'No corresponding line to compare with' > /dev/stderr
        exit 1
    fi

    diff --text -yt -W 19  \
        <(echo "$a" | input) \
        <(echo "$b" | input) \
    | \
    output | \
    {
        type=''
        buf=''
        while read -r line
        do
            if [[ "${line:1:1}" != "$type" ]]
            then
                if [[ "$type" = '|' ]]
                then
                    type='>'
                    echo -n "$down$buf"
                    buf=''
                fi

                if [[ "${line:1:1}" != "$type" ]]
                then
                    type="${line:1:1}"

                    echo -n "$type" \
                        | sed \
                            -e "s/[<|]/$up/" \
                            -e "s/>/$down/" \
                            -e "s/ /$same/"
                fi
            fi

            case "$type" in
            '|')
                buf="$buf${line:2:1}"
                echo -n "${line:0:1}"
                ;;
            '>')
                echo -n "${line:2:1}"
                ;;
            *)
                echo -n "${line:0:1}"
                ;;
            esac
        done

        if [[ "$type" = '|' ]]
        then
            echo -n "$down$buf"
        fi
    }

    echo -e "$reset"
done

IFS="$ifs"

if $demo
then
    demo
fi

anthony · Accepted Answer · 2024-03-26 23:20:48Z

wdiff is actually a very old method of comparing files word-by-word. It worked by reformatting files, then using diff to find differences and passing it back again. I myself suggested adding context, so that rather than word-by-word compare, it does it with each word surrounded by other 'context' words. That allows the diff to synchronise itself on common passages in files much better, especially when files are mostly different with only a few blocks of common words. For example when comparing text for for plagiarism, or re-use.

dwdiff was later created from wdiff. But dwdiff uses that text reformatting function to good effect in dwfilter. This is a great development – it means you can reformat one text to match another, and then compare them using any line-by-line graphical diff displayer. For example, using it with "diffuse" graphical diff....

dwfilter file1 file2 diffuse -w

This reformats file1 to the format of file2 and gives that to diffuse for a visual comparison. file2 is unmodified, so you can edit and merge word differences into it directly in diffuse. If you want to edit file1, you can add -r to reverse which file is reformatted. Try it and you will find it is extremely powerful!

My preference for the graphical diff (shown above) is diffuse as it feels far cleaner and more useful. Also it is a standalone python program, which means it is easy to install and distribute to other UNIX systems.

(ASIDE: newer version of diffuse have refactored the code into a huge array of files. I kept a copy of the old single-file version of diffuse to use on systems without diffuse installed)

Other graphical diffs seem to have a lot of dependencies, but can also be used (your choice). These include meld, kdiff3 or xxdiff.

Another great GUI tool is meld. It is not suitable for huge files because of performance limitations for anything smaller than 100 KB, it works fine and the UI is nice and it allows comparing full directory hierarchies, too. It also allows ignoring selected changes using regex rules but sadly the UI for that is hidden inside the application Preferences dialog. — Mikko Rantalainen, Commented Mar 25 at 16:12
Noted. I have used meld in the past. Don't really like how it slides the two columns past each other, rather than leave gaps for the deleted lines. — anthony, Commented Mar 26 at 23:46
Yes, meld even gets its name from melding one file into another visually. I think the visual style is good for smallish changes and the visual style is usually easier to understand for people that haven't previously seen the format where output has gaps without actual data lines. — Mikko Rantalainen, Commented Mar 27 at 15:08

score 5 · Accepted Answer · 2018-02-02 18:03:00Z

5

Here's a simple one-liner:

diff -y <(cat a.txt | sed -e 's/,/\n/g') <(cat b.txt | sed -e 's/,/\n/g')

The idea is to replace commas (or whichever delimiter you wish to use) with newlines using sed. diff then takes care of the rest.

edited Feb 2, 2018 at 18:03

answered Feb 2, 2018 at 16:14

user82160

Useless use of cat! Add the file names to the sed command!
– anthony
Commented Jun 26 at 3:44
cat is used for better readability, as most people tend to read from left to right. furthermore input redirection would look confusing combined with process substitution - user82160
– Grant Zvolsky
Commented Jun 30 at 22:23

Add a comment |

user1521620 · Accepted Answer · 2020-11-12 22:57:45Z

4

GNU Emacs has the very good "ediff" mode. I use it and press a or b frequently.

And power users get even more fancy, e.g. : https://emacs.stackexchange.com/questions/16469/how-to-merge-git-conflicts-in-emacs

And it has lots of powerful features. Expanded help menu:

I highly recommend committing the capabilities shown in the menu to memory, so you use them when they'd be useful. (Wow, I haven't looked at the menu in so long it has features I forgot it had.)

Emacs is available for every *nix platform. And its as open source as it gets.

I use ediff so often that typing M-x ediff took too long, so I bound it to C-c C-e.

edited Nov 12, 2020 at 22:57

answered Oct 7, 2020 at 18:46

user1521620

1636 bronze badges

Surprised emacs got no mention in 10 years, but I see this question gets plenty of attention, so added this answer.
– user1521620
Commented Oct 7, 2020 at 18:47

Add a comment |

Faheem Mitha · Accepted Answer · 2014-01-20 19:07:23Z

2

xxdiff: Another tool is xxdiff (GUI), which has to be installed, first.
spreadsheet: For database data, a spreadsheet from .csv is easily made, and a formula (A7==K7) ? "" : "diff" or similar inserted, and copy-pasted.

edited Jan 20, 2014 at 19:07

Faheem Mitha

35.4k33 gold badges122 silver badges186 bronze badges

answered Apr 12, 2011 at 14:01

user unknown

10.6k3 gold badges36 silver badges58 bronze badges

1

xxdiff looks like the 80's. Meld looks much better but it's extremely slow for CSV-like files. I've found Diffuse to be the fastest Linux diff tool.
– Dan Dascalescu
Commented Jun 28, 2017 at 22:01
1

@DanDascalescu: A tool which gets the job done looks always fine, no matter how old it looks. Another one, I used occasionally, but isn't installed to test it with long, column data, is tkdiff.
– user unknown
Commented Jun 29, 2017 at 3:08
Does xxdiff display moved lines? Or does it just show a missing line in one file and an added one in the other? (I tried building xxdiff but qmake failed and I see they don't bother to publish a Debian package).
– Dan Dascalescu
Commented Jun 29, 2017 at 3:25
1

@DanDascalescu: Today, I only have tkdiff installed.
– user unknown
Commented Jun 30, 2017 at 0:37

Add a comment |

BaseZen · Accepted Answer · 2021-05-23 01:24:16Z

git diff --no-index --word-diff-regex=. --word-diff=porcelain ${site} ${site-prev} \
    | grep '^[-+]' \
    > ${site_diff}
if [[ $(wc -c < "${site_diff}") -gt 2000 ]]; then
    echo "Warning: ${url} has changed a lot:"
    # deal with it
fi

An overcrowded answer set already, but I wanted to share my solution from the automation perspective. To detect if a website is malfunctioning or has been hacked, I wanted an accurate, minimal diff between successive end-user snapshots. If there's huge changes, it may be reporting an error message, or may be blank, or have been maliciously changed, or accidentally drastically changed.

On the other hand, a CMS generates lots of trivial ID changes (such as cache timestamps) that do not indicate a real change, and the above is built to boil the issue down. If a few hundred IDs change in HTML tags, the net bytecount of difference will still be small, whereas if someone accidentally deletes several paragraphs, the byte count will be much larger. There's always a fuzzy middleground so I hand-wave at 2000 but probably even 500 is good.

It may also be a good idea to ignore whitespace changes depending on the application.

The essential options for me are:

--word-diff-regex=. because words are arbitrarily long in code (SQL / HTML / CSS / whatever), so this is actually a character based diff
--no-index because these files are not in a repository, but YAY GIT!
--word-diff=porcelain to easily extract the actual differences, which are easily detected by the -/+ line prefixes, with no surrounding context.

The output of the top line without the further grep/wc processing is still human friendly and quite clear, possibly useful for the OP.

asoundmove · Accepted Answer · 2011-04-12 03:02:57Z

1

On the command line, I would make sure I add judicious new-lines before comparing files. You can use sed, awk, perl or anything really to add line breaks in some sort of systematic way - make sure not to add too many though.

But I find the best is to use vim as it highlights word differences. vim is good if there are not too many differences and the differences are simple.

answered Apr 12, 2011 at 3:02

asoundmove

2,5153 gold badges25 silver badges27 bronze badges

Although not really an answer to the question this technique is rather efficient to learn about small differences in long lines.
– Sir Cornflakes
Commented Feb 4, 2015 at 19:05

Add a comment |

pillravi · Accepted Answer · 2016-03-23 21:11:45Z

1

I had the same problem and solved it with PHP Fine Diff, an online tool that allows you to specify granularity. I know it's not technically a *nix tool, but I didn't really want to download a program just to do a one-time, character level diff.

answered Mar 23, 2016 at 21:11

pillravi

1115 bronze badges

2

Some users can't upload sensitive or large files to a random online tool. There are plenty of tools that show line-level differences without compromising your privacy.
– Dan Dascalescu
Commented Jun 28, 2017 at 22:01
Yes, there are. But for diffs that don't contain sensitive information, online tools can be a good solution.
– pillravi
Commented Jul 3, 2017 at 14:01
Online diff tools also don't support command line integration. You can't use them from your version control flow. They're also far more cumbersome to use (select file 1, select file 2, upload) and can't do merging.
– Dan Dascalescu
Commented Jul 4, 2017 at 18:41

Add a comment |

Faheem Mitha · Accepted Answer · 2011-04-12 20:42:25Z

0

kdiff3 is becoming the standard GUI diff viewer on Linux. It is similar to xxdiff, but I think kdiff3 is better. It does many things well, including your request to show "exact character differences between two lines in certain files".

answered Apr 12, 2011 at 20:42

Faheem Mitha

35.4k33 gold badges122 silver badges186 bronze badges

KDiff3 is extremely slow to highlight the inline differences in CSV files. I wouldn't recommend it.
– Dan Dascalescu
Commented Jun 28, 2017 at 22:00

Add a comment |

Anthon · Accepted Answer · 2013-05-22 07:07:48Z

0

If I'm reading your question correctly, I use diff -y for this kind of thing.

It makes comparing a side by side comparison much simpler to find which lines are throwing the differences.

edited May 22, 2013 at 7:07

Anthon

79.9k42 gold badges170 silver badges226 bronze badges

answered Apr 12, 2011 at 2:29

rfelsburg

6454 silver badges5 bronze badges

3

This does not highlight the difference within the line. If you have a long line, it's painfull to see the difference. wdiff, git diff --word-diff, vimgit, meld, kbdiff3, tkdiff all do this.
– user2707671
Commented Aug 17, 2018 at 13:07

Add a comment |

rubo77 · Accepted Answer · 2022-08-02 10:16:48Z

0

As noted in some comments above, you can install:

apt install wdiff colordiff

and then use

wdiff -n a b | colordiff

-n, --avoid-wraps do not extend fields through newlines

edited Aug 2, 2022 at 10:16

answered Aug 2, 2022 at 8:52

rubo77

29.4k45 gold badges135 silver badges210 bronze badges

There are already answers suggesting wdiff and comments to those suggesting adding colordiff, so this hardly adds anything.
– Henrik supports the community
Commented Aug 2, 2022 at 9:18
I added this as an answer instead of a comment because it was not there as an answer. Only the comments are no answer (good comments don't upvote the corresponding answer) and colordiff was only mentioned in comments so far
– rubo77
Commented Aug 2, 2022 at 10:16

Add a comment |

Stack Exchange Network

diff within a line

15 Answers 15

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
command-line
diff
.

Linked

Hot Network Questions

diff within a line

15 Answers 15

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged command-linediff.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
command-line
diff
.