34

Sometimes I edit English text that includes Unicode characters. For some reason, on my PC, Notepad++ converts Unicode characters to ???'s thereby corrupting the text and losing all that data. I'm looking for a way to edit such text, while preserving Unicode characters. I'm using Consolas as my Font. If the font doesn't have all those characters, why should I lose the data when I copy the text out of Notepad++ (via Windows' clipboard)?

4
  • Could it be you're using a plugin that doesn't support Unicode?
    – Ivo Flipse
    Commented Aug 11, 2009 at 13:34
  • If those are question marks in boxes, then it's in fact the font's glyph for missing glyphs and your data is not lost.
    – Joey
    Commented Aug 13, 2009 at 19:11
  • No its not in boxes, instead its the plain '?' character. Confirmed. Commented Mar 5, 2010 at 23:15
  • 1
    you may need to change the font. see superuser.com/questions/16831/…
    – RamyenHead
    Commented Jul 22, 2010 at 19:37

7 Answers 7

21

If the file is actually encoded in Unicode, Notepad++ should detect it automatically. The Consolas font works well for me. You can try one of these two menu options:

  • Encoding -> Encode in UTF-8
  • Encoding -> Convert to UTF-8

I'm pretty sure the first one will do what you want.

2
  • I do not have the Format menu.
    – Val
    Commented Dec 12, 2013 at 21:15
  • 3
    For posterity, you need the Encoding menu, not Format Commented Feb 12, 2016 at 20:15
22

The problem described in the question happens when an empty/new document is set to "ANSI", and Unicode characters are pasted into it.

There isn't any auto-detection when used with an empty/new document, at least not in the version of Notepad++ I tested it on (v5.4.5). "ANSI" is the default in Notepad++ for a new document, unless set in menu SettingsPreferences → tab New Document/Open Save Directory.

Solution

The solution is to set the encoding to UTF-8 before pasting, menu FormatEncode in UTF-8:

Menu command "menu Format/Encode in UTF-8" about to be executed

Example

I copied some text to a new Notepad++ document, Russian (русский язык, russkiy yazyk), from Firefox showing the Wikipedia page Russian language.

If the encoding is not changed from "ANSI" this is the result:

Result of pasting the Unicode string "Russian (русский язык, russkiy yazyk" into a new Notepad++ document without changing the encoding from the default "ANSI".

If the encoding is changed this is the result:

Result of pasting the Unicode string "Russian (русский язык, russkiy yazyk" into a new Notepad++ document after changing the encoding from the default "ANSI" to "UTF-8".

As can be seen in the figure below (the Cyrillic part is highlighted), Notepad++ actually converts the Unicode characters into ASCII 63 (hexadecimal 3F), question marks. That is why the Unicode characters are lost (in "ANSI" mode) when copying the text out through the clipboard (it is not a font issue - information is lost).

Screenshot of a hex view of said document

Tested on: Notepad++ v5.4.5 (UNICODE).

5

There are good news and bad news.

Good news: Notepad++ supports Unicode (at least from what I can gather).

Bad news: Apparently Unicode support is only on Windows XP.

I actually do not have a Windows machine in front of me. From what I remember, there is an Encoding menu under the Format menu somewhere. The encoding for Unicode is actually most commonly UTF-8.

Here is a 'pretty' picture of Unicode support in Notepad++,

enter image description here

3

Unicode works perfectly on Windows 7. The only issue that comes up is that you have to retype the characters that have been changed. It's happened to me. I'm writing with Scandinavian letters so ä -> E4, ö -> F6. It's a pain in the butt to replace them all, but it's worth it.

If you encode a page from ANSI -> UTF-8 then there will be some character problems.

I would suggest that you first create a new page in UTF-8 and then copy/paste your information over. There won't/shouldn't be any trouble then.

1

This is worked for me:

I changed the font to Courier New in style configurator on my PC (Windows 7 with English/US character set and Romanian for non-Unicode set). It´s working with Courier New & Tahoma fonts + UTF-8 encoding.

0

On the top menu select the Encoding then choose Encode in UTF-8 or Encode in UTF-8 Without BOM then you can edit text in Unicode encoding.

0

This worked:

Settings → Style Configurator → Font Style → (Font name: Times New Roman, Enable global font) → Save & Close

This, in addition to:

Encoding → UTF-8

The original question asks to edit Unicode text. If this means the Unicode characters should be rendered as well as being editable then not only does one need to choose the proper encoding, such as UTF-8, but one must have a font that renders the UTF-8 encoded characters.

Various fonts have been mentioned. The font that appears to render the greatest range of Unicode characters is Times New Roman. For example, try to render the service mark character ℠ in the other mentioned fonts.

4
  • What question are you answering?
    – Toto
    Commented Jan 27, 2021 at 16:52
  • I don't understand your question. Why isn't it obvious I'm answering the original question and that none of the other answers would render the ℠ unicode within Notepad++? Commented Jan 27, 2021 at 18:53
  • Where have you seen in the question that they want to treat unicode character?
    – Toto
    Commented Jan 27, 2021 at 19:04
  • I clarified the reason for mentioning ℠ Commented Jan 27, 2021 at 19:22

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .