0

I am exporting an excel file (Excel 2016) containing Japanese characters into CSV. (Note : I am not exporting to CSV UTF-8 provided). In the process, all Japanese characters are replaced with '?'

My Windows/Office locale is Japan/Japanese & Windows/office language/format is all Japanese.

I understand that excel uses a codepage to save the CSV file in particular encoding. My understanding was this should be Shift-JIS (as default encoding for Japanese locale). If that is so, why the loss of information & replacement by '?'

What encoding does Excel try to save the CSV in???

(FYI : If I try to open an CSV, excel by default attempts to open the CSV in Shift-JIS 932 as expected)

Note : I am aware of workarounds of using UTF-8. I am interested in understanding above behavior, more than a workaround

Thanks

4
  • What's selected under File => Options => Language => Choose Editing Languages? Does changing that to Japanese (if it isn't already) help?
    – Bob
    Commented Jan 30, 2019 at 23:53
  • Did the default language is Japanese in Excel?
    – Lee
    Commented Jan 31, 2019 at 8:57
  • @Bob - Yes, it was already Japanese. The editing langugae, display & help langauge are all Japanese. Issue occurs despite everything being Japanese Commented Jan 31, 2019 at 10:36
  • @Lee - yes, language for Windows & Office installation on my desktop is Japanese. Commented Jan 31, 2019 at 10:36

2 Answers 2

1

Excel handles CSV encodings badly, and always did.

Exporting a document as Comma Separated CSV does not use your locale’s codepage but saves the characters as ASCII. Characters that cannot be represented that way are exported as question-marks. Only characters in the ASCII range of 0 to 127 are guaranteed to be exported correctly.

The reason for that is maybe that this code in Excel was written even before Windows supported Unicode, but this is just a guess. Office is full of such patch-works, and one needs to use what works.

0

Ah, the joy of locales.

There's an obscure setting buried in the Windows locale options that might be your culprit: Language for non-Unicode programs.

Note: changing this setting may require administrative permissions. If your machine is locked down, you may need to talk to your local admin.

The following is how to find this for Windows 10. The setting name hasn't changed in years and years, but Microsoft keeps moving it, so if you're running something earlier, you'll have to find it by some other means.

  • Open the Start menu and type Region.
  • Open the Region & language settings.
  • On the right, click the blue Additional date, time, & regional settings text.

Alternatively,

  • Open the Start menu and type Control panel.
  • Open the control panels.
  • Double-click Region.

Once you're looking at the legacy Region settings:

  • Click the Administrative tab.
  • You should see two options, Welcome screen and new user accounts, and Language for non-Unicode programs. Click the Change system locale button in that second section.
  • Select Japanese (Japan) from the dropdown.

By default, Windows systems sold in the US have this set to English (United States), or internal Windows locale decimal number 1033. (See various online lists like this one for the locale codes.) This equates to Excel using ASCII encoding when saving to CSV, which naturally doesn't work too well for high-byte languages like Japanese.

If you change this to Japanese (Japan) or locale number 1041, Excel will export using Shift-JIS, and you'll be able to open your CSV exports in a text editor and see non-mojibaked text.

FWIW, my Win10 locale is set to Japanese (Japan), and when I save an Excel file with Japanese content as CSV (MS-DOS) and open it in Notepad++, I see the encoding on the status bar in the lower right as Shift-JIS, and the Japanese is legible.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .