Strings with Unicode

Strings are fully Unicode capable, so you can use them with international characters easily, even in literals, because the default source code encoding for Python 3 is UTF-8. For example, if you have access to Norwegian characters, you can simply enter this:

>>> "Vi er så glad for å høre og lære om Python!"
'Vi er så glad for å høre og lære om Python!'

Alternatively, you can use the hexadecimal representations of Unicode code points as an escape sequence prefixed by u:

>>> "Vi er su00e5 glad for u00e5 hxf8re og lu00e6re om Python!"
'Vi er så glad for å høre og lære om Python!'

We're sure you'll agree, though, that this is somewhat more unwieldy.

Similarly, you can use the x escape sequence followed by a 2-character hexadecimal string to include one-byte Unicode code points in a string literal:

>>> 'xe5'
'å'

You can even an use an escaped octal string using a single backlash followed by three digits in the range zero to seven, although we confess we've never seen this used in practice, except inadvertently as a bug:

>>> '345'
'å'

There are no such Unicode capabilities in the otherwise similar bytes type, which we'll look at next.

Table of Contents for Strings with Unicode

Create new playlist

Sign In

Sign Up

Table of Contents for
Strings with Unicode