19

I'm sure this is an encoding issue, but I can't figure it out.

I exported a spreadsheet from Excel as a UTF-8 CSV. This produced a CSV in the UTF-8-BOM character encoding. Opening this file in Notepad++, most of the characters were rendered correctly - including non ANSI characters like ø. However, a hyphen ( ) is displayed as .

I believe the character is U+2010 ‐ HYPHEN.

If I open the file in Notepad, the hyphen displays correctly. It also displays correctly if I use Vim to read the file or cat to print it out to the terminal.

Finally, the octal dump of the file reveals the hex bytes e2 80 90, which is the UTF-8 encoding of the U+2010 - HYPHEN Unicode character.

So why is Notepad++ displaying this character as ?

3
  • After opening the file, have you tried to change the encoding format? Commented Mar 23, 2021 at 10:50
  • 1
    Yep. Nothing works. The file is definitely saved as UTF-8-BOM, as proved by it working in regular notepad (and python, and vim, and cat) Commented Mar 23, 2021 at 11:07
  • 2
    Just to clarify, this is not the "common hyphen" that we have on our keyboards, which is U+002D: HYPHEN-MINUS.
    – MrWhite
    Commented Mar 24, 2021 at 10:07

1 Answer 1

23

If other characters are being decoded properly and the byte-level data looks correct, it's possible the issue is just with the font. U+2010 is high up enough that some fonts might just not have a glyph available.

This answer to another Super User question states that a number of common Windows 7 fonts don't have the glyph for HYPHEN.

1
  • 8
    It is the font! Switching to consolas, used by regular notepad (and, indeed, stack exchange), solved the issue. Thanks! Commented Mar 23, 2021 at 11:34

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .