0

I would like to reopen a question related to the following:

(Czech) character set support in gvim 7.3 on Windows 7

Basically, in that post I noticed that some Czech characters were being displayed as black squares. So I posted the question and noticed that the problem seemed to go away by changing the font. I thought that solved the problem because the characters in the file I was using displayed correctly.

However, I have noticed the following: while some Czech characters display correctly by changing the font from the Gvim menu, others do not display correctly:

For instance when I paste the character Ů (Latin capital letter u with ring above) or ů (Latin small letter u with ring above), no font displays the resulting character correctly. For instance, the Fixedsys font displays a black square and a small u, respectively, while Lucida Console displays a capital U and a small U, respectively. I have tried all fonts available from the gvim drop-down menu, and none seem to work for this particular case.

The problem does not end here. The input method for unicode characters produces the wrong characters:

CTRL-V u0160 should produce the Czech character (Š) but the backquote (') is inserted instead. CTRL-V u016e should produce the Czech character (Ů) but the n character (n) is inserted instead. And the list goes on.

As if that were not enough, there is a list of alternative input method key combinations at the following site (which is a list of digraphs): http://code.google.com/p/vim/source/browse/runtime/doc/digraph.txt

but despite having the latest verion of gvim, when I type ":digraphs", this list does not show up. Only the old list from gvim 7.3 shows up, which does not include these.

For instance CTRL-K U0 and CTRL-K u0 both produce the character zero instead of the following:

Ů U0 016E 0366 LATIN CAPITAL LETTER U WITH RING ABOVE

ů u0 016F 0367 LATIN SMALL LETTER U WITH RING ABOVE

To summarize, despite gvim 7.4 being recently released, none of the distributed fonts are compatible with the Czech language, inserting unicode via CTRL-V seems to produce the wrong characters, and digraph support is incomplete.

Thank you for your answers.

1 Answer 1

1

Problem is that coding Latin-2 (iso-8859-2) and Windows-1250 (used by windows) differ in some characters:

ž, š, ť, Ž, Š, Ť

All differences are summarized at Wikipedia or Czech version

If you set encoding=cp1250, then it'll be ok.


I don't want to prolong comments so I'm adding that here.

There is a problem that standard code page uses only 1byte (hex 100) for characters, so there are ISO standards for different languages. If you have set encoding iso-8859-2 and trying to add unicode character (hex 160) Š, than gvim loops over to character (hex 60). You have to use codes ISO-8859-2, where Š ìs (hex 089). Other codes here: http://cs.wikipedia.org/wiki/ISO_8859-2

UTF-8 on the other hand uses 2bytes and contains simultaineously all? letters and signs. So if you use set encoding=utf-8 and then add U0160 or U5927 you'll get Š resp. .

Fixedsys contains ů and Ů, OR there is a difference in font versions between Windows language mutations (I use Czech version), but I doubt that. You can use windows utility Charmap.exe, there you can select desired font and check which characters it supports, even their unicode code.

I was trying briefly some of default fonts in GVim and there seems to be some that supports Chinese (ie MS Mincho), but I don't which signs are important.

GVim seems to be supporting only monospace character fonts so, if you'll be searching for another font be aware of that. :)

9
  • OK. That solved all of the above problems (on Windows7) from the following point of view: now I can :set encoding cp1250 and I can enter all of the Czech characters with either of the CTRL-K (digraph) or the CTRL-V (unicode codes) methods. Commented Nov 2, 2013 at 18:49
  • Of course, when I create the CodePage 1250 file under Windows 7 and then transfer it to a Linux environment and open it there using LibreOffice Writer, (e.g. using the /usr/bin/lowriter command), a dialog box show up where I can choose the "Character Set", and I need to specify "Eastern Europe (Windows-1250/WinLatin 2), otherwise the wrong character set will be loaded and wrong stuff would be displayed in place of the special characters. Commented Nov 2, 2013 at 19:01
  • From my point of view, however, what is even better is to set the encoding to utf-8 with the command :set encoding=utf-8 under windows 7, edit the file, then save it, and then open it under Linux where the default character set is utf-8 rather than uso 8859-1 latin1, and open it there under Linux without having to specify any flags. Regards. Commented Nov 2, 2013 at 19:07
  • So, since the default encoding is latin1, and utf-8 is the most portable of all file formats, rather than using a Windows-specific file format such as a specific code page, it is perhaps better to :set encoding=utf-8 inside the _vimrc (or .vimrc). The only problem that remains is how to ensure the default font setting supporting the desired characters is saved as a user preference. Commented Nov 2, 2013 at 19:32
  • 1
    There shouldn't be a problem with default font with utf-8, Fixedsys supports czech script.
    – week
    Commented Nov 2, 2013 at 21:19

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .