How do I find the encoding of the current buffer in vim?

Question

Say I am editing some file with vim (or gvim). I have no idea about the file's encoding and I want to know whether it is in UTF-8 or ISO-8859-1 or whatever? Can I somehow tell vim to show me what encoding is used?

Community · Accepted Answer · 2011-11-08 14:53:22Z

127

The fileencoding setting shows the current buffer's encoding:

:set fileencoding
fileencoding=utf8

There really isn't a common way to determine the encoding of a plaintext file, as that information isn't saved in the file itself - except UTF-8 Files where you've got a so called BOM which indicates the Encoding. This is why xml and html files have charset metatags.

You can enforce a particular encoding with the 'encoding' setting. See :help encoding and :help fileencoding in Vim for how the editor handles these settings. You can also add several fileencoding settings to your vimrc to have vim try detecting based on the ones listed.

edited Nov 8, 2011 at 14:53

CommunityBot

1

answered Aug 24, 2009 at 13:52

jtimberman

21.7k12 gold badges69 silver badges77 bronze badges

7

Unfortunatelly, not correct. For Vim cannot find the encoding of the file you're reading. It is not written in the file. It can only guess based on the available characters in the file. For example a file with the text "abcdef" can be in several encodings, since practically all support those characters, but a file with "šđčćž" will likely be in CP1252. So, you're not reading the encoding from somewhere, but guessing what encoding could that be, and based on that displaying it properly.
– Rook
Commented Aug 24, 2009 at 14:29
6

What you are doing here is explicitly setting the encoding, based on your observations of the file's contents. If you wish for vim to try several encoding, when opening a file, put several of them in the option in your _vimrc.
– Rook
Commented Aug 24, 2009 at 14:32
@ldigas, thanks for the feedback, I've updated the answer to be a bit more clear on that (I hope!)
– jtimberman
Commented Aug 24, 2009 at 15:18
I only wish that the answer were this easy. It's not, see my answer below for the 'right' way and explanation.
– dotancohen
Commented Dec 26, 2013 at 7:00
5

Probably worth mentioning that BOMs are 1.) Not unique to UTF-8 -- though UTF-8's is distinct from other BOMs, 2.) Not required and often not found in UTF-8.
– ruffin
Commented Oct 16, 2014 at 15:09

| Show 2 more comments

dotancohen · Accepted Answer · 2013-12-26 06:59:51Z

20

Note that files' encoding is not explicitly stated anywhere in a file. Thus, VIM and other applications must guess at the encoding. The canonical way of doing this is with the chardet application, which can be run from within VIM as so:

:!chardet %

The answer provided by jtimberman shows you the encoding of the current buffer which may not be the same encoding as the file on disk. Thus, you will notice that chardet will sometimes show a different encoding than VIM, especially if you have VIM configured to always use a specific encoding (i.e. UTF-8).

The nice thing about chardet is that it gives a confidence score for its guess, whereas VIM can be (and often is) wrong about guessing the encoding if there are not many characters above \x7F (ASCII 127). For instance, adding a single א to a long file of PHP code makes chardet think that the file is ISO-8859-2 with a confidence of 0.72, whereas adding the slightly longer phrase שלום, עולם!‏ gives UTF-8 with a confidence score of 0.99. In both cases, set fileencoding? showed UTF-8 not because the file on disk was UTF-8, but because VIM is configured to use UTF-8 internally.

answered Dec 26, 2013 at 6:59

dotancohen

11.6k19 gold badges68 silver badges98 bronze badges

1

I suggest that you mention a word about the availability of chardet across OS'es.
– Soundararajan
Commented Aug 31, 2018 at 9:28
@Soundararajan: I'm probably not the guy to mention that as I use Debian and CentOS only. You are invited to edit the answer if you have relevant information, though. Thanks!
– dotancohen
Commented Aug 31, 2018 at 12:28
I don't see the need to do that inside VIM, better to do it from outside: chardet <file>. Still, good suggestion.
– lepe
Commented Aug 3, 2019 at 7:10
@dotancohen I believe Soundararajan's point is that the Windows command line does not ship with chardet, and this answer will not work out of the box there. (If it wasn't clear to readers, :! is a shortcut in vim to run a command on the command line, here chardet, which is not [directly] related to vim. This is also why lepe says you can skip the middlehuman and run it on the commandline outside of vim.)
– ruffin
Commented Oct 14, 2021 at 21:01

Add a comment |

Pierre-Damien · Accepted Answer · 2023-08-30 07:08:29Z

4

I found that : https://vim.fandom.com/wiki/Reloading_a_file_using_a_different_encoding

You can reload a file using a different encoding if Vim was not able to detect the correct encoding

:e ++enc=<encoding>

where encoding could be cp850, ISO-8859-1, UTF-8, ...

You can use file yourfilename to find encoding or chardetect (provided by python-chardet or uchardet depending your Linux distribution) as suggested by dotancohen.

edited Aug 30, 2023 at 7:08

answered Jun 20, 2019 at 9:05

Pierre-Damien

3712 silver badges7 bronze badges

This doesn't answer the question of how to find out current encoding. Instead this command will force some other encoding on the buffer.
– Ruslan
Commented Aug 9, 2019 at 9:55

Add a comment |

Stack Exchange Network

How do I find the encoding of the current buffer in vim?

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
vim
gvim
character-encoding
.

Linked

Hot Network Questions

How do I find the encoding of the current buffer in vim?

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged vimgvimcharacter-encoding.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
vim
gvim
character-encoding
.