1

In Excel, I tried to import (using Data > Import) a CSV file with Chinese characters. The characters are represented as Unicode numeric character references (NCR); for example 香辣猪. Although I have set the "File Origin" to "65001: Unicode (UTF-8)", but seems like it doesn't do anything.

Please note that:

香辣猪 is supposed to display as 香辣猪

The following is a screenshot of the import screen. You can see the column "Product Title" has Chinese characters in Unicode, but the characters are not being displayed properly. I have also tried almost all other Unicode, and Chinese related "File Origin", but all without success.

enter image description here

Please help, how can I import the CSV file with Chinese characters in Unicode, successfully in Excel?

8
  • 1
    Are you opening it directly (double click from Explorer) or using "Data > from text" import into a blank workbook? Does it make a difference? Commented Mar 24, 2022 at 6:24
  • I think it doesn't make much of a difference. If I open it directly, it will just open as it is, without giving me any opportunity to set/adjust any options. If I use Data > import, then, I will have the opportunity to set/adjust options, that's what you see in my screenshot, but this also doesn't work -> Excel doesn't display the characters properly (meaning it doesn't decode the Unicode characters)
    – J K
    Commented Mar 24, 2022 at 7:04
  • Could you include the exact contents of the file, maybe upload an example somwhere, or post a hexdump? In particular: Does the file really contain Characters encoded in UTF-8, or does it contain HTML numeric character reference - in your Q you posted numeric character references (like 香). I don't think Excel parses 香 .
    – sleske
    Commented Mar 24, 2022 at 10:17
  • 1
    Oh yes, in my CSV file, it actually contain numeric character references (like 香)
    – J K
    Commented Mar 24, 2022 at 10:27

2 Answers 2

1

Your file is probably encoded with standard ANSI/ASCII character codes. Instead of encoding the UNICODE characters at byte level in the CSV file, it is actually representing the single unicode character code as a series of alphanumeric characters (i.e. a number string spelling out the unicode character number). It is called a numeric character reference (NCR) and is commonly used in markup languages like HTML for backwards compatibility with browsers or systems without unicode support. The "&#" identifier signals the start of a NCR.

As far as I can tell, there is no native support in Excel to convert NCR-containing strings to unicode, but you can convert the individual numbers to unicode using the UNICHAR function, e.g.:

=UNICHAR(39321)&UNICHAR(36771)&UNICHAR(29482)

How to convert numeric character reference string to UNICODE in Excel

If you have Excel 365 (need SEQUENCE and TEXTJOIN), you can convert an all NCR string like 香辣猪 in A1 to a unicode string using in A2:

=TEXTJOIN("",,UNICHAR(MID(A1,SEQUENCE(INT(LEN(A1)/8),,3,8),5)))

Assuming each code is exactly 8 characters long ("&#" + 5 numeric + ";").

For older versions of Excel, you can hack it using

=SUBSTITUTE(SUBSTITUTE(REPLACE(A1,1,1,"="),"#","UNICHAR("), ";",")")

Which generates the formula required as a text string. Copy the result and paste "as value" only. Edit the cell and press enter to evaluate the cell formula and generate the final UNICODE text.

4
  • I tried =TEXTJOIN("",,UNICHAR(MID(A1,SEQUENCE(INT(LEN(A1)/8),,3,8),5))) in Google Sheets, against a cell with value "香辣猪". But it gives me this message "Function SEQUENCE parameter 2 value is 0. It should be greater than or equal to 1."
    – J K
    Commented Mar 25, 2022 at 7:19
  • When I amended SEQUENCE parameter 2 to value 1, the function runs OK, but only converts 1 character to Chinese, out of the 3 characters in "香辣猪"
    – J K
    Commented Mar 25, 2022 at 8:03
  • It works fine in Excel, either with SEQUENCE parameter 2 blank or as a 1. Please evaluate in Excel and confirm. Commented Mar 25, 2022 at 10:36
  • Ah yes, for Google Sheets enclose the formula in ARRAYFORMULA e.g. =ARRAYFORMULA(TEXTJOIN("",,UNICHAR(MID(A1,SEQUENCE(INT(LEN(A1)/8),1,3,8),5)))) Commented Mar 25, 2022 at 13:59
0

This answer is provided for "historic" reasons and serves only to educate others who might struggle to import different character set csv files into Excel. Here are a few things to try

  • Try opening it in notepad or another more advanced text editor. Even if the characters don't show up properly, "save as" the file and change the character encoding (e.g. utf8, utf16 etc.) and then see what Excel does with those.

  • Try opening the file in Google Sheets or Libre Calc first, then save and export to Excel .xlsx file format from there.

  • Excel uses your local computers language and regional settings to determine how to import a csv. On Windows search for regional and language settings in the control panel (not the new "Settings"). Set your language to the same language the file is in. Also check the advanced settings such as delimiter, decimal separator, date format etc. - these must match the formatting of our csv file. (NB: It's probably a good idea to memorise the keyboard shortcuts how to get your system back to your first language. Or better yet enable the language bar and add the CSV file language as a second language so you can easily switch between languages using left shift+alt or windows button+space).

1
  • I have tried that, but it doesn't work. Thanks.
    – J K
    Commented Mar 24, 2022 at 10:22

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .