1

I have a directory containing files, some of them are UTF-8, some are CP-1251. I want to convert the ones that are CP-1251 to be UTF-8, but without corrupting the UTF-8 files.

I tried using iconv -f cp1251 -t utf8 <...>, it works for CP-1251, but if the file is already UTF-8, it is also converted and becomes incomprehensible.

2 Answers 2

1

You could get a list of files that are neither UTF-8 nor US-ASCII using:

file -0 -i *.txt | awk -F '\0' '$2 !~ /charset=(us-ascii|utf-8)$/ {print $1}'
1
  • Just a minor correction - I tried it, but it showed the files that are UTF-8, instead of those that "are neither UTF-8 nor US-ASCII".
    – sashoalm
    Commented Jan 4, 2014 at 11:59
1

I found a way to do it using enconv:

enconv -L bulgarian -x utf8 file.txt

It works for both UTF-8 and CP-1251 files.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .