How to achieve correct pasting from Russian WebPage into some particular UTF-8 text files in Notepad++? The letters are taken as '?'

Question

After pasting the text was all '?' inspite of proper Russian characters. I had to convert the file with distorted text with all '?' to UTF-8 BOM, then to copy text once more in browser, then paste once more into Notepad++ to get proper characters pasted. The procedure was necessary after any later changes in file. It was even more problematic with UTF-8, I was able to get proper Russian characters pasted as with UTF-8 BOM, however the characters were always distorted again after program restart. I fought several months with the issue and could not understand what was the logic, it looked like a hardcoded codepage would be used istead of UTF-8 if created the new empty file (UTF-8 BOM set in config for new docs), pasted Russian text, saved, restarted the program. Everytime I changed something later and simply saved but omitted to do 'Convert to UTF-8 BOM' + Save, the whole text was distorted again.

Similar problems occured even with some other languages with more exotic latin code page (for Central European Windows-1250 the character 'č' was lost after program restart if file had *.nfo extension f.e.). Hebrew worked not at all, I solved that via conversion into HTML character entities. Notepad 7.9.1 x64 on Win7 SP3 x64.

user1260850 · Accepted Answer · 2021-01-11 14:32:59Z

I recently found out which additional logic do change the behavior of the program. The point is, there is some more logic hardcoded in program for some well known file name extensions! I created the UTF-8 text file with *.nfo extension on Windows with intention to use it as information file to some document in my case. However, *.nfo is well known extension for torrent-readme like text with plain text formatted graphics inside and UTF-8 support looks to be explicitly turned off for such extension(s). This ends with reopening the file using some fixed codepage if loaded from disk (after program restart) and doing automatic conversion to UTF-8 BOM on the fly, which failed in the case (UTF-8-code-UTF-8 roundtrip is not always defined). I did not realize, that the info extension is common in Windows for my purpose. Everything worked fine after I switched to *.info extension as expected in Windows.

this looks to be a bug. the loading from disk should never do UTF-8 BOM -> codepage -> UTF-8 BOM roundtrip conversion. if UTF-8 BOM enabled is intended for *.nfo, it should behave transparently and do no conversion at all in case of *.nfo with UTF-8 BOM format or UTF-8 text found. — user1260850, Commented Jan 11, 2021 at 14:42

Stack Exchange Network

How to achieve correct pasting from Russian WebPage into some particular UTF-8 text files in Notepad++? The letters are taken as '?'

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
notepad++
.

Hot Network Questions

How to achieve correct pasting from Russian WebPage into some particular UTF-8 text files in Notepad++? The letters are taken as '?'

1 Answer 1

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged notepad++.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
notepad++
.