Batch convert to UTF-8 a directory having both UTF-8 and CP-1251 files

Question

I have a directory containing files, some of them are UTF-8, some are CP-1251. I want to convert the ones that are CP-1251 to be UTF-8, but without corrupting the UTF-8 files.

I tried using iconv -f cp1251 -t utf8 <...>, it works for CP-1251, but if the file is already UTF-8, it is also converted and becomes incomprehensible.

grawity_u1686 · Accepted Answer · 2014-01-04 11:51:16Z

1

You could get a list of files that are neither UTF-8 nor US-ASCII using:

file -0 -i *.txt | awk -F '\0' '$2 !~ /charset=(us-ascii|utf-8)$/ {print $1}'

answered Jan 4, 2014 at 11:51

grawity_u1686

465k66 gold badges977 silver badges1.1k bronze badges

Just a minor correction - I tried it, but it showed the files that are UTF-8, instead of those that "are neither UTF-8 nor US-ASCII".
– sashoalm
Commented Jan 4, 2014 at 11:59

Add a comment |

sashoalm · Accepted Answer · 2014-01-04 11:55:49Z

1

I found a way to do it using enconv:

enconv -L bulgarian -x utf8 file.txt

It works for both UTF-8 and CP-1251 files.

answered Jan 4, 2014 at 11:55

sashoalm

4,07016 gold badges54 silver badges78 bronze badges

Add a comment |

Stack Exchange Network

Batch convert to UTF-8 a directory having both UTF-8 and CP-1251 files

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
linux
batch
conversion
encoding
utf-8
.

Hot Network Questions

Batch convert to UTF-8 a directory having both UTF-8 and CP-1251 files

2 Answers 2

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged linuxbatchconversionencodingutf-8.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
linux
batch
conversion
encoding
utf-8
.