Credit: David Ascher
You want to print Unicode strings to standard output (e.g., for debugging), but they don’t fit in the default encoding.
Wrap the stdout
stream with a converter, using the
codecs
module:
import codecs, sys sys.stdout = codecs.lookup('iso8859-1')[-1](sys.stdout)
Unicode strings live in a large space, big enough for all of the
characters in every language worldwide, but thankfully the internal
representation of Unicode strings is irrelevant for users of Unicode.
Alas, a file stream, such as sys.stdout
, deals
with bytes and has an encoding associated with it. You can change the
default encoding that is used for new files by modifying the
site
module. That, however, requires changing your
entire Python installation, which is likely to confuse other
applications that may expect the encoding you originally configured
Python to use (typically ASCII). This recipe rebinds
sys.stdout
to be a stream that expects Unicode
input and outputs it in ISO8859-1 (also known as Latin-1). This
doesn’t change the encoding of any previous
references to sys.stdout
, as illustrated here.
First, we keep a reference to the original, ASCII-encoded
stdout
:
>>> old = sys.stdout
Then we create a Unicode string that wouldn’t go
through stdout
normally:
>>> char = u"N{GREEK CAPITAL LETTER GAMMA}" # a character that doesn't fit in ASCII >>> print char Traceback (most recent call last): File "<stdin>", line 1, in ? UnicodeError: ASCII encoding error: ordinal not in range(128)
Now we wrap stdout
in the
codecs
stream writer for UTF-8, a much richer
encoding, rebind sys.stdout
to it, and try again:
>>> sys.stdout = codecs.lookup('utf-8')[-1](sys.stdout)
>>> print char
Γ