2

I am opening a CSV file which is created with Unicode 1200 codepage but when I bring up the wizard in excel to open it, this codepage is not there. I only see UTF-7 and UTF-8.

enter image description here

I do need to open an existing csv file which is in Unicode 1200 codepage but out of curiosity, if I just a create a new excel sheet and try to save it with this code page, the option is still not there.

In this case, interestingly the options are few and more high level. For example there is CSV UTF-8 (comma delimited) (*.csv) option and another `Unicode Text(*.txt)' but no option for actual unicode page in the later case so how do we know what exactly is this? I know utf-16 is codepage 1200 but that's not there.

enter image description here

3
  • 1
    Perhaps you need a newer version of Excel. I am running O365 and have 1200 code page. It is right after the Ukranian (MAC) entry. (I also don't see that in the Save As dialog, though). Commented Oct 1, 2019 at 0:49
  • Code page 1200 is UTF-16LE which is essentially what's called "Unicode" in MS terms
    – phuclv
    Commented Oct 1, 2019 at 1:27
  • Based on the official article Code Page Identifiers and my research, I agree with @phuclv. The .NET Name of Identifier 1200 is utf-16. Unicode UTF-16, little endian byte order (BMP of ISO 10646). It's called Unicode in Office applications. When I import a text into Excel for Office 365, I can find it out in File Origin option. You can get more information from Unicode Text Document Commented Oct 4, 2019 at 8:18

2 Answers 2

1

If anyone is in the same situation, here are some observations based on my empirical testing.

We start with a source CSV file that has Unicode 1200 page encoding. As comments have pointed out, this is also called UTF-16 and notepad calls it UCS-2 LE BOM.

  1. If I open the excel file as ANSI (which is the default option in excel data wizard) and save the back as CSV (Comma delimited)(*.csv), it does the change the file encoding to ANSI and all foreign language data is lost. So this is definitely no no.

  2. If I open the excel file as ANSI like before but this time save it as Unicode Text (*.txt) It does save it as UTF-16 format and retains all foreign language data. That said the file format is .txt now but that can be renamed in explorer.

So the good thing is that excel still read the file fine and saving it as Unicode text file kept the desired encoding but beware of similar looking save as option in excel like CSV (MS-DSOC)(.csv)* and few others too which will likely change encoding to ASCII or others.

0

Sadly, in Office 2013 and earlier, there is no UTF-16... and if you attempt to import the file even without international characters, there are painful bugs.

One example: a CSV file with commas embedded in quoted strings will not import correctly.

Convert to UTF-8 BOM with Notepad++, and it imports just fine.

I recommend using UTF-8 BOM for compatibility AND for file size efficiency. It handles 100% of all Unicode code points.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .