Strings with Unicode

Strings are fully Unicode capable, so you can use them with international characters easily, even in literals, because the default source code encoding for Python 3 is UTF-8. For example, if you have access to Norwegian characters, you can simply enter this:

>>> "Vi er så glad for å høre og lære om Python!"
'Vi er så glad for å høre og lære om Python!'

Alternatively, you can use the hexadecimal representations of Unicode code points as an escape sequence prefixed by u:

>>> "Vi er su00e5 glad for u00e5 hxf8re og lu00e6re om Python!"
'Vi er så glad for å høre og lære om Python!'

We're sure you'll agree, though, that this is somewhat more unwieldy.

Similarly, you can use the x escape sequence followed by a 2-character hexadecimal string to include one-byte Unicode code points in a string literal:

>>> 'xe5'
'å'

You can even an use an escaped octal string using a single backlash followed by three digits in the range zero to seven, although we confess we've never seen this used in practice, except inadvertently as a bug:

>>> '345'
'å'

There are no such Unicode capabilities in the otherwise similar bytes type, which we'll look at next.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset