1

I'm trying to diff two files. One of these files contains extra full-width spaces (U+3000):

Text 1
Text 2
Text 3
Text 1
   Text 2
Text 3

diff -w A.txt B.txt reports

2c2
< Text 2
---
>    Text 2

I want to know if there are any options / workarounds so I can get diff by ignore any whitespaces characters (U+3000 Ideographic Space, for example).

Files processed are UTF-8 (with BOM) with CRLF line breaks.

It is fine to use other tools / workarounds if it is not possible with diff.

5
  • diff against the output of sed? <()
    – Tom Yan
    Commented Aug 18, 2021 at 3:44
  • @TomYan Seems diff <(cat A.txt | sed 's/\s//g' | sed 's/ //g') <(cat B.txt | sed 's/\s//g' | sed 's/ //g') works in my case. Though quite ugly...
    – tsh
    Commented Aug 18, 2021 at 4:04
  • First of all you don't need cat. sed can take a file as input, just don't use -i. Besides sed s/\(\s\| \)//g should probably work (not sure about portability whatsoever).
    – Tom Yan
    Commented Aug 18, 2021 at 4:37
  • What are settings for locale? Commented Aug 18, 2021 at 5:33
  • @RomeoNinov locale command says LANG=C.UTF-8, LC_CTYPE="C.UTF-8", ... I hadn't touched that setting before. Should I set locale to something else?
    – tsh
    Commented Aug 18, 2021 at 9:54

0

You must log in to answer this question.

Browse other questions tagged .