Diff to actually totally ignore whitespace

Question

I am looking for an application that will diff two files, and actually ignore all whitespace, so for example:

class foo { 
  bar
  spaz 
}

is equally equivilant to

class foo{bar spaz}

or, as well

classfoo { 
  barspaz}

but NOT

classfoo { 
  spazspaz
}

i.e. it would show me that spaz in the previous example has taken the place of bar in any of the other examples. It only needs to compare 2 files.

It can be a windows or linux/unix/posix-compatible utility
I've tried the lin/unix diff -w command, it only ignores whitespace if the difference per-line is whitespace. I don't see an option to totally ignore whitespace.
I also tried UECompare or Ultracompare, a non-free comparison utility for windows.

By design, diff compares files line by line. The newlines serve as delimiters for the changes. Without them it would be hard to do a meaningful comparison. shurane's answer would work, but it is less than useful for the reason set out in the comments. — Roland Smith, Commented Oct 5, 2013 at 15:55
How the heck in any programming language is "barspaz" the same as "bar spaz"? Very strange. — Peon, Commented Oct 5, 2013 at 16:41
You could replace all whitespace with linebreaks, then diff the results. That would generate lots of lines of output for large chunks of differences, though. I think you'd need a new program (or a diff flag) to compare files without treating linebreaks specially. — Blacklight Shining, Commented Oct 6, 2013 at 16:41
Try wdiff too. It's not line-based like diff, so it manages your first equivalence (though not your second). — ShreevatsaR, Commented Dec 1, 2014 at 19:57

Ehtesh Choudhury · Accepted Answer · 2013-10-05 14:29:37Z

6

Are you looking for something like the tr command? Here are the manpages. It's included with msysgit, cygwin, and gnuwin32 tools as far as I can tell.

So you can remove all the whitespace prior to diffing by doing something like:

tr --delete '[:space:]' <filename.txt

You can then feed the output of that command to diff and have it work without having any whitespace.

For example, I have a file named HelloWorldApp.java. Let me show you how tr processes it:

C:\temp>cat HelloWorldApp.java
class HelloWorldApp {
    public static void main(String[] args) {
        System.out.println("Hello World!"); // Display the string.
    }
}
C:\temp>tr -d '[:space:]' <HelloWorldApp.java
classHelloWorldApp{publicstaticvoidmain(String[]args){System.out.println("HelloWorld!");//Displaythestring.}}

edited Oct 5, 2013 at 14:29

answered Oct 5, 2013 at 14:24

Ehtesh Choudhury

1,4781 gold badge15 silver badges13 bronze badges

2

The problem with this is it gives you a single line of everything mashed together. If you tried to feed that into diff, diff would tell you that each file has one line, and the lines are different. Not very helpful.
– Blacklight Shining
Commented Oct 5, 2013 at 15:39
2

@BlacklightShining Well, yes, but that is what the question asked for...
– evilsoup
Commented Oct 5, 2013 at 15:45
1

@BlacklightShining Presumably the OP will be working with very short files where there is a chance of them matching up after the whitespace is removed. I can't think of a specific task where this would be useful... but then that's the point of having these modular *nix commands: so that they can be used together for niche tasks that most people would never think of.
– evilsoup
Commented Oct 5, 2013 at 18:00
Essentially I am trying to compare two pieces of CSS, one has been partially minified (e.g. some rules are in one line), the other has not. They contain about 90% of the same rules. I think I'm just going to write a custom parser.
– A.B. Carroll
Commented Oct 9, 2013 at 17:58
To get a bit more meaningful output, instead of deleting all whitespace, try replacing whitespace with a newline first: tr '[:space:]' '\n'
– Seppo Enarvi
Commented Sep 30, 2014 at 11:46

Add a comment |

Peter Hackett · Accepted Answer · 2014-04-25 21:24:39Z

Despite being line oriented, diff could be made to do what's being requested. Roughly, you could do the diff -w and then post process the diff output. It could look at the diff output and join various pairs of lines to see if they (now) match the line in the other file.

Seems like it might be O(n^2) or other nasty thing, but it would still be very helpful if it limited it self to (say) no join, join 2 lines, join 3 lines on file 1 X no join, join 2 lines, join 3 lines on file 2 on a moving "window" of the diff output. (where X is ~ "cross product")

In fact, seems like a job that a Perl diff output post-processing script could do with few hours of work (depending on your (Perl) programming skill)

Maybe when I have some task that I don't really want to be doing at work ... :-)

Stack Exchange Network

Diff to actually totally ignore whitespace

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
linux
windows
diff
.

Linked

Hot Network Questions

Diff to actually totally ignore whitespace

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged linuxwindowsdiff.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
linux
windows
diff
.