How can I identify non-ASCII characters from the shell?

Question

Is there a simple way to print all non-ASCII characters and the line numbers on which they occur in a file using a command line utility such as grep, awk, perl, etc?

I want to change the encoding of a text file from UTF-8 to ASCII, but before doing so, wish to manually replace all instances of non-ASCII characters to avoid unexpected character changes effected by the file conversion routine.

RedGrittyBrick · Accepted Answer · 2012-04-27 13:27:07Z

19

$ perl -ne 'print "$. $_" if m/[\x80-\xFF]/'  utf8.txt
2 Pour être ou ne pas être
4 Byť či nebyť
5 是或不

or

$ grep -n -P '[\x80-\xFF]' utf8.txt
2:Pour être ou ne pas être
4:Byť či nebyť
5:是或不

where utf8.txt is

$ cat utf8.txt
To be or not to be.
Pour être ou ne pas être
Om of niet zijn
Byť či nebyť
是或不

edited Apr 27, 2012 at 13:27

answered Apr 26, 2012 at 22:07

RedGrittyBrick

83.7k20 gold badges139 silver badges212 bronze badges

1

Thanks. The perl snippet works directly, but the grep version doesn't work with GNU grep 2.16. I was able to make it work via: LC_ALL=C grep -n -P [$'\x80'-$'\xFF'], where the first bit turns off collation.
– Joe Corneli
Commented Sep 18, 2014 at 12:23

Add a comment |

Community · Accepted Answer · 2020-06-12 13:48:39Z

4

I want to change the encoding of a text file from UTF-8 to ASCII ...

... replace all instances of non-ASCII characters ...

Then tell your conversion tool to do so.

$ iconv -c -f UTF-8 -t ASCII <<< 'Look at 私.'
Look at .

$ iconv -c -f UTF-8 -t ASCII//translit <<< 'áēìöų'
aeiou

edited Jun 12, 2020 at 13:48

CommunityBot

1

answered Apr 26, 2012 at 22:44

Ignacio Vazquez-Abrams

113k11 gold badges207 silver badges249 bronze badges

He said he wanted to do that replacement manually. Perhaps the most appropriate replacement is context-dependent.
– mark4o
Commented Apr 27, 2012 at 0:39

Add a comment |

Stack Exchange Network

How can I identify non-ASCII characters from the shell?

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
linux
grep
perl
awk
ascii
.

Hot Network Questions

How can I identify non-ASCII characters from the shell?

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged linuxgrepperlawkascii.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
linux
grep
perl
awk
ascii
.