I'm trying to diff two files. One of these files contains extra full-width spaces (U+3000):
Text 1
Text 2
Text 3
Text 1
Text 2
Text 3
diff -w A.txt B.txt
reports
2c2
< Text 2
---
> Text 2
I want to know if there are any options / workarounds so I can get diff by ignore any whitespaces characters (U+3000 Ideographic Space, for example).
Files processed are UTF-8 (with BOM) with CRLF line breaks.
It is fine to use other tools / workarounds if it is not possible with diff
.
<()
diff <(cat A.txt | sed 's/\s//g' | sed 's/ //g') <(cat B.txt | sed 's/\s//g' | sed 's/ //g')
works in my case. Though quite ugly...-i
. Besidessed s/\(\s\| \)//g
should probably work (not sure about portability whatsoever).locale
command saysLANG=C.UTF-8
,LC_CTYPE="C.UTF-8"
, ... I hadn't touched that setting before. Should I set locale to something else?