Our users are experiencing a very discouraging issue in regards to how MS Word (in Windows) handles non-unicode characters. This issue is confirmed in both Word 2007 and the Word 2010 Beta using Windows XP SP3; I suspect it works the same way in 2003.


  1. A user creates a document using a non-unicode font, entering characters to represent scientific notations. For example, he enters a Mu (µ). Note: I pasted in a unicode-compliant Mu for reference.
  2. The user opens his document and attempts to copy / paste this non-unicode character representing a Mu into a web browser for entry into our system. It pastes as an unrecognized character. This is expected.
  3. The user opens his document, selects the non-unicode character and adjusts its font to "Arial Unicode MS," saving the document. He closes / re-opens the document for good measure. Once re-opened, he copies what should be a unicode Mu and pastes it into the web browser. It is still represented as an unrecognized character.
  4. The user creates a new document, sets the font to "Arial Unciode MS" and creates a Mu. He copies this Mu into the web browser and it pastes over in Unicode, as expected.


Word is not actually converting non-unicode characters into unicode characters when it should, when a unicode font is selected. Instead, it is taking a best-guess for display reasons but doing no actual conversion.

How do I overcome this problem?

  • Can I change some setting in Word to force a conversion? Preferable.
  • Is there a "cleaner" app or Word macro that will do this?
  • Other solutions?

Additional Notes:

  • Re-typing the affected documents using unicode is not an option
  • This is not an issue in Mac OS X using the most recent version of Word. A sample case such as in (3) results in a unicode Mu being pasted into the browser.

Please help!

  • StackOverflow is a forum for programming-related questions. Try SuperUser.com instead.
    – Borealid
    Commented Jul 13, 2010 at 18:20
  • 1
    Noted should be that you don't need to create a new question over there. With enough votes this question will be moved sooner or later. Just have patience.
    – user26750
    Commented Jul 13, 2010 at 18:24
  • What are non-unicode characters?
    – Philipp
    Commented Jul 13, 2010 at 18:26
  • @Philipp: That's generally the layman's term for characters outside ISO-8859-X range (or whatever default encoding the underlying platform is using, e.g. CP-1252 at Windows or Roman at Mac OS). Very contradicting term indeed since Unicode actually covers every character the human linguistic world is aware of ;)
    – user26750
    Commented Jul 13, 2010 at 18:27
  • 1
    Cut and Paste issues can be related to programming, even if this specific question isn't. Substitute "my program" for "MS Word" and the topic doesn't really change. Commented Jul 13, 2010 at 18:33

2 Answers 2


Try using Paste Special; there should be an option for Unicode text.

Note that if the source document was created with a Symbol font, this won't help. Windows doesn't really know that the character is related to a specific Unicode character, the symbol fonts were created before Unicode as a way of meeting a need and the two aren't interchangeable.


A lengthy process but I normally convert such files into images and then process those images through any OCR software. That helps. But, I was myself searching for an even better option.

You must log in to answer this question.