How to set sys.stdout encoding in Python 3?

Question

Setting the default output encoding in Python 2 is a well-known idiom:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout)

This wraps the sys.stdout object in a codec writer that encodes output in UTF-8.

However, this technique does not work in Python 3 because sys.stdout.write() expects a str, but the result of encoding is bytes, and an error occurs when codecs tries to write the encoded bytes to the original sys.stdout.

What is the correct way to do this in Python 3?

@dan_waterworth: I didn't think of trying that before, but I just tried 2to3 now and it didn't suggest any changes for the given code. — Greg Hewgill, Commented Dec 7, 2010 at 8:42
If the new code doesn't work then I'd suggest you add this as a bug. — dan_waterworth, Commented Dec 7, 2010 at 10:10
Wow, this causes a lot of fun in an interactive shell - try sys.stdout = codecs.getwriter("hex")(sys.stdout) in ipython to see what I mean... — Tobias Kienzler, Commented Dec 9, 2013 at 13:23
PowerShell redirection seems to re-encode everything to UTF-16, so if you're using redirection, you might need to use regular cmd instead. I verified type foo.txt > foo2.txt changes a UTF-8 foo.txt to a UTF-16 foo2.txt, so what Python writes to stdout isn't the last word. None of the solutions below worked for me because of this. — Terry Brown, Commented Nov 4, 2019 at 17:32

sth · Accepted Answer · 2018-09-17 16:47:35Z

66

Since Python 3.7 you can change the encoding of standard streams with reconfigure():

sys.stdout.reconfigure(encoding='utf-8')

You can also modify how encoding errors are handled by adding an errors parameter.

answered Sep 17, 2018 at 16:47

sth

227k56 gold badges285 silver badges368 bronze badges

2

What about if you are trying to maintain compatibility with Python 3.6?
– Dan
Commented Mar 19, 2020 at 13:21
4

@Dan Then you can't use this
– sth
Commented Mar 19, 2020 at 23:00
2

@Dan Well on this very page there are a lot of other answers with alternative solutions. My answer is not the only answer on this question, there are other answers with other approaches from times before Python 3.7. Isn't that what you are looking for?
– sth
Commented Mar 23, 2020 at 13:19
2

I'm running Anaconda Python 3.8, and the statement "sys.stdout.reconfigure(encoding='utf-8')" generates an exception: "AttributeError: 'OutStream' object has no attribute 'reconfigure'" What am I missing?
– Marc B. Hankin
Commented Sep 3, 2021 at 18:47
1

sys.stdout.buffer is the untranslated stream. sys.stdout = codecs.getwriter('utf8')(sys.stdout.buffer) will work.
– Mark Tolonen
Commented Apr 12, 2023 at 17:56

| Show 3 more comments

Community · Accepted Answer · 2020-06-20 09:12:55Z

53

Python 3.1 added io.TextIOBase.detach(), with a note in the documentation for sys.stdout:

The standard streams are in text mode by default. To write or read binary data to these, use the underlying binary buffer. For example, to write bytes to stdout, use sys.stdout.buffer.write(b'abc'). Using io.TextIOBase.detach() streams can be made binary by default. This function sets stdin and stdout to binary:
def make_streams_binary():
    sys.stdin = sys.stdin.detach()
    sys.stdout = sys.stdout.detach()

Therefore, the corresponding idiom for Python 3.1 and later is:

sys.stdout = codecs.getwriter("utf-8")(sys.stdout.detach())

edited Jun 20, 2020 at 9:12

CommunityBot

11 silver badge

answered Dec 7, 2010 at 7:59

Greg Hewgill

979k187 gold badges1.2k silver badges1.3k bronze badges

6

I'd use PYTHONIOENCODING; otherwise io.TextIOWrapper might be better alternative than codecs to handle newlines properly.
– jfs
Commented Dec 21, 2013 at 3:46
5

This totally changes the behavior of sys.stdout. The StreamWriter returned by codecs.getwriter is not line-buffered anymore, e.g..
– Sebastian
Commented Jun 8, 2017 at 15:22

Add a comment |

ideasman42 · Accepted Answer · 2015-09-23 05:27:10Z

45

I found this thread while searching for solutions to the same error,

An alternative solution to those already suggested is to set the PYTHONIOENCODING environment variable before Python starts, for my use - this is less trouble then swapping sys.stdout after Python is initialized:

PYTHONIOENCODING=utf-8:surrogateescape python3 somescript.py

With the advantage of not having to go and edit the Python code.

edited Sep 23, 2015 at 5:27

answered Oct 23, 2011 at 7:53

ideasman42

46.3k47 gold badges211 silver badges339 bronze badges

11

Thumbs-upping mainly because PYTHONIOENCODING=utf-8 solved my problem, after many hours of searching.
– theeggman85
Commented Apr 9, 2017 at 6:24

Add a comment |

Jack O'Connor · Accepted Answer · 2017-08-04 15:34:58Z

38

Other answers seem to recommend using codecs, but open works for me:

import sys
sys.stdout = open(sys.stdout.fileno(), mode='w', encoding='utf8', buffering=1)
print("日本語")
# Also works with other methods of writing to stdout:
sys.stdout.write("日本語\n")
sys.stdout.buffer.write("日本語\n".encode())

This works even when I run it with PYTHONIOENCODING="ascii".

edited Aug 4, 2017 at 15:34

answered Nov 2, 2015 at 2:57

Jack O'Connor

10.7k4 gold badges51 silver badges54 bronze badges

This worked for me for dealing with an error caused by importing a module that I could not change. On a pretty vanilla Linux system that defaulted to LC_ALL = C, my program generated 'ascii' code can't encode character .... ordinal not in range(128) when code from the imported module tried to print something. I could not rely on users of my program changing LC_ALL to 'en_US.UTF-8'. This hack solved it. I know it's an ugly approach, but I could not find another solution.
– mhucka
Commented Jun 10, 2018 at 21:33

Add a comment |

bobince · Accepted Answer · 2010-12-07 11:23:47Z

Setting the default output encoding in Python 2 is a well-known idiom

Eek! Is that a well-known idiom in Python 2? It looks like a dangerous mistake to me.

It'll certainly mess up any script that tries to write binary to stdout (which you'll need if you're a CGI script returning an image, for example). Bytes and chars are quite different animals; it's not a good idea to monkey-patch an interface that is specified to accept bytes with one that only takes chars.

CGI and HTTP in general explicitly work with bytes. You should only be sending bytes to sys.stdout. In Python 3 that means using sys.stdout.buffer.write to send bytes directly. Encoding page content to match its charset parameter should be handled at a higher level in your application (in cases where you are returning textual content, rather than binary). This also means print is no good for CGI any more.

(To add to the confusion, wsgiref's CGIHandler has been broken in py3k until very recently, making it impossible to deploy WSGI to CGI that way. With PEP 3333 and Python 3.2 this is finally workable.)

This comment needs to be updated, concerning 3.3 and upcoming 3.4 Python release. Thank you — soshial, Commented Nov 1, 2013 at 20:04

ptomato · Accepted Answer · 2015-06-05 18:43:55Z

13

Using detach() causes the interpreter to print a warning when it tries to close stdout just before it exits:

Exception ignored in: <_io.TextIOWrapper mode='w' encoding='UTF-8'>
ValueError: underlying buffer has been detached

Instead, this worked fine for me:

default_out = io.TextIOWrapper(sys.stdout.buffer, encoding='utf-8')

(And, of course, writing to default_out instead of stdout.)

answered Jun 5, 2015 at 18:43

ptomato

57.4k13 gold badges114 silver badges167 bronze badges

Add a comment |

Neuron · Accepted Answer · 2023-04-12 15:14:38Z

8

sys.stdout is in text mode in Python 3. Hence you write unicode to it directly, and the idiom for Python 2 is no longer needed.

Where this would fail in Python 2:

>>> import sys
>>> sys.stdout.write(u"ûnicöde")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfb' in position 0: ordinal not in range(128)

However, it works just dandy in Python 3:

>>> import sys
>>> sys.stdout.write("Ûnicöde")
Ûnicöde7

Now if your Python doesn't know what your stdouts encoding actually is, that's a different problem, most likely in the build of the Python.

edited Apr 12, 2023 at 15:14

Neuron

5,6435 gold badges42 silver badges61 bronze badges

answered Dec 7, 2010 at 9:44

Lennart Regebro

171k43 gold badges228 silver badges252 bronze badges

2

My context was running the Python script as a CGI under Apache, where the default output encoding wasn't what I needed (I needed UTF-8). I think it's better for the script to ensure that its output is in the correct encoding, rather than relying on external settings (such as environment variables like PYTHONIOENCODING).
– Greg Hewgill
Commented Dec 7, 2010 at 10:03
1

Yet another proof that using stdout for process communication is big mistake. I realize you may have no choice than to use CGI in this case though so that's not your fault. :-)
– Lennart Regebro
Commented Dec 7, 2010 at 11:45
2

While it is true that sys.stdout is a binary file in Python 2 and a text file in Python 3, I think your Python 2 example fails because the unicode string u"ûnicöde" that gets implicitly encoded in the sys.stdout.write method has characters outside the ASCII range. If you change your LC_CTYPE, LANG or PYTHONIOENCODING environment variables to an encoding that has all the characters in the unicode string you should not get any error. (I have tried on Python 2.7.)
– Géry Ogam
Commented Feb 22, 2018 at 8:43

Add a comment |

Collectives™ on Stack Overflow

How to set sys.stdout encoding in Python 3?

7 Answers 7

Not the answer you're looking for? Browse other questions tagged
python
unicode
python-3.x
stdout
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Not the answer you're looking for? Browse other questions tagged pythonunicodepython-3.xstdout or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
python
unicode
python-3.x
stdout
or ask your own question.