85

Setting the default output encoding in Python 2 is a well-known idiom:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout)

This wraps the sys.stdout object in a codec writer that encodes output in UTF-8.

However, this technique does not work in Python 3 because sys.stdout.write() expects a str, but the result of encoding is bytes, and an error occurs when codecs tries to write the encoded bytes to the original sys.stdout.

What is the correct way to do this in Python 3?

5
  • 2to3 is a useful tool for questions like these. Commented Dec 7, 2010 at 8:25
  • @dan_waterworth: I didn't think of trying that before, but I just tried 2to3 now and it didn't suggest any changes for the given code. Commented Dec 7, 2010 at 8:42
  • 3
    If the new code doesn't work then I'd suggest you add this as a bug. Commented Dec 7, 2010 at 10:10
  • 2
    Wow, this causes a lot of fun in an interactive shell - try sys.stdout = codecs.getwriter("hex")(sys.stdout) in ipython to see what I mean... Commented Dec 9, 2013 at 13:23
  • PowerShell redirection seems to re-encode everything to UTF-16, so if you're using redirection, you might need to use regular cmd instead. I verified type foo.txt > foo2.txt changes a UTF-8 foo.txt to a UTF-16 foo2.txt, so what Python writes to stdout isn't the last word. None of the solutions below worked for me because of this. Commented Nov 4, 2019 at 17:32

7 Answers 7

66

Since Python 3.7 you can change the encoding of standard streams with reconfigure():

sys.stdout.reconfigure(encoding='utf-8')

You can also modify how encoding errors are handled by adding an errors parameter.

8
  • 2
    What about if you are trying to maintain compatibility with Python 3.6?
    – Dan
    Commented Mar 19, 2020 at 13:21
  • 4
    @Dan Then you can't use this
    – sth
    Commented Mar 19, 2020 at 23:00
  • 2
    @Dan Well on this very page there are a lot of other answers with alternative solutions. My answer is not the only answer on this question, there are other answers with other approaches from times before Python 3.7. Isn't that what you are looking for?
    – sth
    Commented Mar 23, 2020 at 13:19
  • 2
    I'm running Anaconda Python 3.8, and the statement "sys.stdout.reconfigure(encoding='utf-8')" generates an exception: "AttributeError: 'OutStream' object has no attribute 'reconfigure'" What am I missing? Commented Sep 3, 2021 at 18:47
  • 1
    sys.stdout.buffer is the untranslated stream. sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer) will work. Commented Apr 12, 2023 at 17:56
53

Python 3.1 added io.TextIOBase.detach(), with a note in the documentation for sys.stdout:

The standard streams are in text mode by default. To write or read binary data to these, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc'). Using io.TextIOBase.detach() streams can be made binary by default. This function sets stdin and stdout to binary:

def make_streams_binary():
    sys.stdin = sys.stdin.detach()
    sys.stdout = sys.stdout.detach()

Therefore, the corresponding idiom for Python 3.1 and later is:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())
2
  • 6
    I'd use PYTHONIOENCODING; otherwise io.TextIOWrapper might be better alternative than codecs to handle newlines properly.
    – jfs
    Commented Dec 21, 2013 at 3:46
  • 5
    This totally changes the behavior of sys.stdout. The StreamWriter returned by codecs.getwriter is not line-buffered anymore, e.g..
    – Sebastian
    Commented Jun 8, 2017 at 15:22
45

I found this thread while searching for solutions to the same error,

An alternative solution to those already suggested is to set the PYTHONIOENCODING environment variable before Python starts, for my use - this is less trouble then swapping sys.stdout after Python is initialized:

PYTHONIOENCODING=utf-8:surrogateescape python3 somescript.py

With the advantage of not having to go and edit the Python code.

1
  • 11
    Thumbs-upping mainly because PYTHONIOENCODING=utf-8 solved my problem, after many hours of searching. Commented Apr 9, 2017 at 6:24
38

Other answers seem to recommend using codecs, but open works for me:

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
print("日本語")
# Also works with other methods of writing to stdout:
sys.stdout.write("日本語\n")
sys.stdout.buffer.write("日本語\n".encode())

This works even when I run it with PYTHONIOENCODING="ascii".

1
  • This worked for me for dealing with an error caused by importing a module that I could not change. On a pretty vanilla Linux system that defaulted to LC_ALL = C, my program generated 'ascii' code can't encode character .... ordinal not in range(128) when code from the imported module tried to print something. I could not rely on users of my program changing LC_ALL to 'en_US.UTF-8'. This hack solved it. I know it's an ugly approach, but I could not find another solution.
    – mhucka
    Commented Jun 10, 2018 at 21:33
18

Setting the default output encoding in Python 2 is a well-known idiom

Eek! Is that a well-known idiom in Python 2? It looks like a dangerous mistake to me.

It'll certainly mess up any script that tries to write binary to stdout (which you'll need if you're a CGI script returning an image, for example). Bytes and chars are quite different animals; it's not a good idea to monkey-patch an interface that is specified to accept bytes with one that only takes chars.

CGI and HTTP in general explicitly work with bytes. You should only be sending bytes to sys.stdout. In Python 3 that means using sys.stdout.buffer.write to send bytes directly. Encoding page content to match its charset parameter should be handled at a higher level in your application (in cases where you are returning textual content, rather than binary). This also means print is no good for CGI any more.

(To add to the confusion, wsgiref's CGIHandler has been broken in py3k until very recently, making it impossible to deploy WSGI to CGI that way. With PEP 3333 and Python 3.2 this is finally workable.)

1
  • This comment needs to be updated, concerning 3.3 and upcoming 3.4 Python release. Thank you
    – soshial
    Commented Nov 1, 2013 at 20:04
13

Using detach() causes the interpreter to print a warning when it tries to close stdout just before it exits:

Exception ignored in: <_io.TextIOWrapper mode='w' encoding='UTF-8'>
ValueError: underlying buffer has been detached

Instead, this worked fine for me:

default_out = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

(And, of course, writing to default_out instead of stdout.)

8

sys.stdout is in text mode in Python 3. Hence you write unicode to it directly, and the idiom for Python 2 is no longer needed.

Where this would fail in Python 2:

>>> import sys
>>> sys.stdout.write(u"ûnicöde")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfb' in position 0: ordinal not in range(128)

However, it works just dandy in Python 3:

>>> import sys
>>> sys.stdout.write("Ûnicöde")
Ûnicöde7

Now if your Python doesn't know what your stdouts encoding actually is, that's a different problem, most likely in the build of the Python.

3
  • 2
    My context was running the Python script as a CGI under Apache, where the default output encoding wasn't what I needed (I needed UTF-8). I think it's better for the script to ensure that its output is in the correct encoding, rather than relying on external settings (such as environment variables like PYTHONIOENCODING). Commented Dec 7, 2010 at 10:03
  • 1
    Yet another proof that using stdout for process communication is big mistake. I realize you may have no choice than to use CGI in this case though so that's not your fault. :-) Commented Dec 7, 2010 at 11:45
  • 2
    While it is true that sys.stdout is a binary file in Python 2 and a text file in Python 3, I think your Python 2 example fails because the unicode string u"ûnicöde" that gets implicitly encoded in the sys.stdout.write method has characters outside the ASCII range. If you change your LC_CTYPE, LANG or PYTHONIOENCODING environment variables to an encoding that has all the characters in the unicode string you should not get any error. (I have tried on Python 2.7.)
    – Géry Ogam
    Commented Feb 22, 2018 at 8:43

Not the answer you're looking for? Browse other questions tagged or ask your own question.