The term built-in has more than one meaning in Python. In most contexts, a built-in means an object directly accessible to Python code without an import
statement. “Python built-ins” shows the mechanism that Python uses to allow this direct access. Built-in types in Python include numbers, sequences, dictionaries, sets, functions (all covered in Chapter 3), classes (covered in “Python Classes”), standard exception classes (covered in “Exception Objects”), and modules (covered in “Module Objects”). “The io Module” covers the (built-in, in v2) file
type, and “Internal Types” covers some other built-in types intrinsic to Python’s internal operation. This chapter provides additional coverage of core built-in types (in “Built-in Types”) and covers built-in functions available in the module builtins
(named __builtins__
in v2) in “Built-in Functions”.
As mentioned in “Python built-ins”, some modules are known as “built-in” because they are an integral part of the Python standard library (even though it takes an import
statement to access them), as distinguished from separate, optional add-on modules, also called Python extensions. This chapter covers some core built-in modules: namely, the modules sys
in “The sys Module”, copy
in “The copy Module”, collections
in “The collections Module”, functools
in “The functools Module”, heapq
in “The heapq Module”, argparse
in “The argparse Module”, and itertools
in “The itertools Module”. Chapter 8 covers some string-related core built-in modules (string
in “The string Module”, codecs
in “The codecs Module”, and unicodedata
in “The unicodedata Module”), and Chapter 9 covers re
in “Regular Expressions and the re Module”.
Table 7-1 covers Python’s core built-in types, such as int
, float
, dict
, and many others. More details about many of these types, and about operations on their instances, are found throughout Chapter 3. In this section, by “number” we mean, specifically, “noncomplex number.”
bool |
Returns |
bytearray |
A mutable sequence of bytes ( |
bytes |
In v2, a synonym of |
complex |
Converts any number, or a suitable string, to a complex number. |
dict |
Returns a new dictionary with the same items as
You can call |
float |
Converts any number, or a suitable string, to a floating-point number. See “Floating-point numbers”. |
frozenset |
Returns a new frozen (i.e., immutable) set object with the same items as iterable |
int |
Converts any number, or a suitable string, to an |
list |
Returns a new list object with the same items as iterable |
memoryview |
|
object |
Returns a new instance of |
set |
Returns a new mutable set object with the same items as the iterable object |
slice |
Returns a slice object with the read-only attributes |
str |
Returns a concise, readable string representation of |
super |
Returns a super-object of object |
tuple |
Returns a tuple with the same items as iterable |
type |
Returns the type object that is the type of |
unicode |
(v2 only.) Returns the Unicode string object built by decoding byte-string |
Use isinstance
(covered in Table 7-2), not equality comparison of types, to check whether an instance belongs to a particular class, in order to properly support inheritance. Checking type(
x
)
for equality or identity to some other type object is known as type checking. Type checking is inappropriate in production Python code, as it interferes with polymorphism. Just try to use x
as if it were of the type you expect, handling any problems with a try
/except
statement, as discussed in “Error-Checking Strategies”; this is known as duck typing.
When you just have to type-check, typically for debugging purposes, use isinstance
instead. isinstance(
x
,
atype
)
, although in a more general sense it, too, is type checking, nevertheless is a lesser evil than type(
x
) is
atype
, since it accepts an x
that is an instance of any subclass of atype
, not just a direct instance of atype
itself. In particular, isinstance
is perfectly fine when you’re checking specifically for an ABC (abstract base class: see “Abstract Base Classes”); this newer idiom is known as goose typing.
Table 7-2 covers Python functions (and some types that in practice are only used as if they were functions) in the module builtins
(in v2, __builtins__
), in alphabetical order. Built-ins’ names are not reserved words. You can bind, in local or global scope, an identifier that’s a built-in name (although we recommend you avoid doing so; see the following warning). Names bound in local or global scope override names bound in built-in scope: local and global names hide built-in ones. You can also rebind names in built-in scope, as covered in “Python built-ins”.
Avoid accidentally hiding built-ins: your code might need them later. It’s tempting to use, for your own variables, natural names such as input
, list
, or filter
, but don’t do it: these are names of built-in Python types or functions. Unless you get into the habit of never hiding built-ins’ names with your own, sooner or later you’ll get mysterious bugs in your code caused by just such hiding occurring accidentally.
Several built-in functions work in slightly different ways in v3 than they do in v2. To remove some differences, start your v2 module with from future_builtins import *
: this makes the built-ins ascii
, filter
, hex
, map
, oct
, and zip
work the v3 way. (To use the built-in print
function in v2, however, use from __future__ import print_function
.)
Most built-in functions and types cannot be called with named arguments, only with positional ones. In the following list, we specifically mention cases in which this limitation does not hold.
__import__ |
Deprecated in modern Python; use, instead, |
abs |
Returns the absolute value of number |
all |
|
any |
|
ascii |
v3 only, unless you have |
bin |
Returns a binary string representation of integer |
callable |
Returns |
chr |
Returns a string of length |
compile |
Compiles a string and returns a code object usable by |
delattr |
Removes the attribute |
dir |
Called without arguments, |
divmod |
Divides two numbers and returns a pair whose items are the quotient and remainder. See also |
enumerate |
Returns a new iterator object whose items are pairs. For each such pair, the second item is the corresponding item in
|
eval |
Returns the result of an expression. |
exec |
In v3, like |
filter |
In v2, returns a list of those items of
In v2, when
In v3, whatever the type of |
format |
Returns |
getattr |
Returns |
globals |
Returns the |
hasattr |
Returns |
hash |
Returns the hash value for |
hex |
Returns a hexadecimal string representation of integer |
id |
Returns the integer value that denotes the identity of |
input |
In v2, In v3, |
intern |
Ensures that |
isinstance |
Returns |
issubclass |
Returns |
iter |
Creates and returns an iterator, an object that you can repeatedly pass to the
See also “Sequences” and When called with two arguments, the first argument must be callable without arguments, and
Don’t call iter in a for clauseAs discussed in “The for Statement”, the statement
|
len |
Returns the number of items in |
locals |
Returns a dictionary that represents the current local namespace. Treat the returned dictionary as read-only; trying to modify it may or may not affect the values of local variables, and might raise an exception. See also |
map |
In v3, In v2, |
max |
Returns the largest item in the only positional argument |
min |
Returns the smallest item in the only positional argument |
next |
Returns the next item from iterator |
oct |
Converts integer |
open |
Opens or creates a file and returns a new file object. In v3, |
ord |
In v2, returns the ASCII/ISO integer code between |
pow |
When |
|
In v3, formats with |
range |
In v2, returns a list of integers in arithmetic progression:
When In v3, |
raw_input |
v2 only: writes |
reduce |
(In v3, function
An example use of
|
reload |
Reloads and reinitializes the module object |
repr |
Returns a complete and unambiguous string representation of |
reversed |
Returns a new iterator object that yields the items of |
round |
Returns a |
setattr |
Binds |
sorted |
Returns a list with the same items as iterable
Argument |
sum |
Returns the sum of the items of iterable |
unichr |
v2 only: returns a Unicode string whose single character corresponds to |
vars |
When called with no argument, |
xrange |
v2 only: an iterable of integers in arithmetic progression, and otherwise similar to |
zip |
In v2, returns a list of tuples, where the |
The attributes of the sys
module are bound to data and functions that provide information on the state of the Python interpreter or affect the interpreter directly. Table 7-3 covers the most frequently used attributes of sys
, in alphabetical order. Most sys
attributes we don’t cover are meant specifically for use in debuggers, profilers, and integrated development environments; see the online docs for more information. Platform-specific information is best accessed using the platform
module, covered online, which we do not cover in this book.
argv |
The list of command-line arguments passed to the main script. |
byteorder |
|
builtin_module_names |
A tuple of strings, the name of all the modules compiled into this Python interpreter. |
displayhook |
In interactive sessions, the Python interpreter calls
You can rebind |
dont_write_bytecode |
If true, Python does not write a bytecode file (with extension .pyc or, in v2, .pyo) to disk, when it imports a source file (with extension .py). Handy, for example, when importing from a read-only filesystem. |
excepthook |
When an exception is not caught by any handler, propagating all the way up the call stack, Python calls |
exc_info |
If the current thread is handling an exception, Holding on to a traceback object can make some garbage uncollectableA traceback object indirectly holds references to all variables on the call stack; if you hold a reference to the traceback (e.g., indirectly, by binding a variable to the tuple that |
exit |
Raises a |
float_info |
A read-only object whose attributes hold low-level details about the implementation of the |
getrefcount |
Returns the reference count of |
getrecursionlimit |
Returns the current limit on the depth of Python’s call stack. See also “Recursion” and |
getsizeof |
Returns the size in bytes of |
maxint |
(v2 only.) The largest |
maxsize |
Maximum number of bytes in an object in this version of Python (at least |
maxunicode |
The largest codepoint for a Unicode character in this version of Python (at least |
modules |
A dictionary whose items are the names and module objects for all loaded modules. See “Module Loading” for more information on |
path |
A list of strings that specifies the directories and ZIP files that Python searches when looking for a module to load. See “Searching the Filesystem for a Module” for more information on |
platform |
A string that names the platform on which this program is running. Typical values are brief operating system names, such as |
ps1, ps2 |
|
setrecursionlimit |
Sets the limit on the depth of Python’s call stack (the default is |
stdin, stdout, stderr |
|
tracebacklimit |
The maximum number of levels of traceback displayed for unhandled exceptions. By default, this attribute is not set (i.e., there is no limit). When |
version |
A string that describes the Python version, build number and date, and C compiler used. |
As discussed in “Assignment Statements”, assignment in Python does not copy the righthand-side object being assigned. Rather, assignment adds a reference to the righthand-side object. When you want a copy of object x
, you can ask x
for a copy of itself, or you can ask x
’s type to make a new instance copied from x
. If x
is a list, list(
x
)
returns a copy of x
, as does x
[:]
. If x
is a dictionary, dict(
x
)
and x
.copy()
return a copy of x
. If x
is a set, set(
x
)
and x
.copy()
return a copy of x
. In each case, we think it’s best to use the uniform and readable idiom of calling the type, but there is no consensus on this style issue in the Python community.
The copy
module supplies a copy
function to create and return a copy of many types of objects. Normal copies, such as list(
x
)
for a list x
and copy.copy(
x
)
, are known as shallow copies: when x
has references to other objects (either as items or as attributes), a normal (shallow) copy of x
has distinct references to the same objects. Sometimes, however, you need a deep copy, where referenced objects are deep-copied recursively; fortunately, this need is rare, since a deep copy can take a lot of memory and time. The copy
module supplies a deepcopy
function to create and return a deep copy.
copy |
Creates and returns a shallow copy of |
deepcopy |
Makes a deep copy of
A class can customize the way |
The collections
module supplies useful types that are collections (i.e., containers), as well as the abstract base classes (ABCs) covered in “Abstract Base Classes”. Since Python 3.4, the ABCs are in collections.abc
(but, for backward compatibility, can still be accessed directly in collections
itself: the latter access will cease working in some future release of v3).
ChainMap
“chains” multiple mappings together; given a ChainMap
instance c
, accessing c
[
key
]
returns the value in the first of the mappings that has that key, while all changes to c
only affect the very first mapping in c
. In v2, you could approximate this as follows:
class
ChainMap
(
collections
.
MutableMapping
):
def
__init__
(
self
,
*
maps
):
self
.
maps
=
list
(
maps
)
self
.
_keys
=
set
()
for
m
in
self
.
maps
:
self
.
_keys
.
update
(
m
)
def
__len__
(
self
):
return
len
(
self
.
_keys
)
def
__iter__
(
self
):
return
iter
(
self
.
_keys
)
def
__getitem__
(
self
,
key
):
if
key
not
in
self
.
_keys
:
raise
KeyError
(
key
)
for
m
in
self
.
maps
:
try
:
return
m
[
key
]
except
KeyError
:
pass
def
__setitem__
(
self
,
key
,
value
)
self
.
maps
[
0
][
key
]
=
value
self
.
_keys
.
add
(
key
)
def
__delitem__
(
self
,
key
)
del
self
.
maps
[
0
][
key
]
self
.
_keys
=
set
()
for
m
in
self
.
maps
:
self
.
_keys
.
update
(
m
)
Other methods could be defined for efficiency, but this is the minimum set that MutableMapping
requires. A stable, production-level backport of ChainMap
to v2 (and early versions of Python 3) is available on PyPI and can therefore be installed like all PyPI modules—for example, by running pip install chainmap
.
See the ChainMap documentation in the online Python docs for more details and a collection of “recipes” on how to use ChainMap
.
Counter
is a subclass of dict
with int
values that are meant to count how many times the key has been seen (although values are allowed to be <=0
); it roughly corresponds to types that other languages call “bag” or “multi-set.” A Counter
instance is normally built from an iterable whose items are hashable: c
=
collections.Counter(
iterable
)
. Then, you can index c
with any of iterable
’s items, to get the number of times that item appeared. When you index c
with any missing key, the result is 0
(to remove an entry in c
, use del
c
[
entry
]
; setting c
[
entry
]
=
0
leaves entry
in c
, just with a corresponding value of 0
).
c
supports all methods of dict
; in particular, c.update(otheriterable)
updates all the counts, incrementing them according to occurrences in otheriterable
. So, for example:
c
=
collections
.
Counter
(
'moo'
)
c
.
update
(
'foo'
)
leaves c['o']
giving 4
, and c['f']
and c['m']
each giving 1
.
In addition to dict
methods, c
supports three extra methods:
elements |
Yields, in arbitrary order, keys in |
most_common |
Returns a list of pairs for the |
subtract |
Like |
See the Counter documentation in the online Python docs for more details and a collection of useful “recipes” on how to use Counter
.
OrderedDict
is a subclass of dict
that remembers the order in which keys were originally inserted (assigning a new value for an existing key does not change the order, but removing a key and inserting it again does). Given an OrderedDict
instance o
, iterating on o
yields keys in order of insertion (oldest to newest key), and
removes and returns the item at the key most recently inserted. Equality tests between two instances of o
.popitem()OrderedDict
are order-sensitive; equality tests between an instance of OrderedDict
and a dict
or other mapping are not. See the OrderedDict documentation in the online Python docs for more details and a collection of “recipes” on how to use OrderedDict
.
defaultdict
extends dict
and adds one per-instance attribute, named default_factory
. When an instance d
of defaultdict
has None
as the value of d.default_factory
, d
behaves exactly like a dict
. Otherwise, d.default_factory
must be callable without arguments, and d
behaves just like a dict
except when you access d
with a key k
that is not in
d
. In this specific case, the indexing d
[
k
]
calls d
.default_factory()
, assigns the result as the value of d
[
k
]
, and returns the result. In other words, the type defaultdict
behaves much like the following Python-coded class:
class
defaultdict
(
dict
):
def
__init__
(
self
,
default_factory
,
*
a
,
**
k
):
dict
.
__init__
(
self
,
*
a
,
**
k
)
self
.
default_factory
=
default_factory
def
__getitem__
(
self
,
key
):
if
key
not
in
self
and
self
.
default_factory
is
not
None
:
self
[
key
]
=
self
.
default_factory
()
return
dict
.
__getitem__
(
self
,
key
)
As this Python equivalent implies, to instantiate defaultdict
you pass it an extra first argument (before any other arguments, positional and/or named, if any, to be passed on to plain dict
). That extra first argument becomes the initial value of default_factory
; you can access and rebind default_factory
.
All behavior of defaultdict
is essentially as implied by this Python equivalent (except str
and repr
, which return strings different from those they’d return for a dict
). Named methods, such as get
and pop
, are not affected. All behavior related to keys (method keys
, iteration, membership test via operator in
, etc.) reflects exactly the keys that are currently in the container (whether you put them there explicitly, or implicitly via an indexing that called default_factory
).
A typical use of defaultdict
is to set default_factory
to list
, to make a mapping from keys to lists of values:
def
make_multi_dict
(
items
):
d
=
collections
.
defaultdict
(
list
)
for
key
,
value
in
items
:
d
[
key
]
.
append
(
value
)
return
d
Called with any iterable whose items are pairs of the form (
key
,
value
)
, with all keys being hashable, this make_multi_dict
function returns a mapping that associates each key to the lists of one or more values that accompanied it in the iterable (if you want a pure dict
result, change the last statement into return dict(d)
—this is rarely necessary).
If you don’t want duplicates in the result, and every value
is hashable, use a collections.defaultdict(set)
, and add
rather than append
in the loop.
namedtuple
is a factory function, building and returning a subclass of tuple
whose instances’ items you can access by attribute reference, as well as by index.
namedtuple |
A
|
For more details and advice about using namedtuple
, see the online docs.
The functools
module supplies functions and types supporting functional programming in Python, listed in Table 7-4.
cmp_to_key |
|
lru_cache |
(v3 only; to use in v2, |
partial |
as an alternative to the lambda-using snippet:
and to the most concise approach, a list comprehension:
|
reduce |
Like the built-in function |
total_ordering |
A class decorator suitable for decorating classes that supply at least one inequality comparison method, such as |
wraps |
A decorator suitable for decorating functions that wrap another function |
The heapq
module uses min heap algorithms to keep a list in “nearly sorted” order as items are inserted and extracted. heapq
’s operation is faster than calling a list’s sort
method after each insertion, and much faster than bisect
(covered in the online docs). For many purposes, such as implementing “priority queues,” the nearly sorted order supported by heapq
is just as good as a fully sorted order, and faster to establish and maintain. The heapq
module supplies the following functions:
heapify |
Permutes
If a list satisfies the (min) heap condition, the list’s first item is the smallest (or equal-smallest) one. A sorted list satisfies the heap condition, but many other permutations of a list also satisfy the heap condition, without requiring the list to be fully sorted. |
heappop |
Removes and returns the smallest (first) item of |
heappush |
Inserts the |
heappushpop |
Logically equivalent to
|
heapreplace |
Logically equivalent to
|
merge |
Returns an iterator yielding, in sorted order (smallest to largest), the items of the |
nlargest |
Returns a reverse-sorted list with the |
nsmallest |
Returns a sorted list with the |
Several functions in the heapq
module, although they perform comparisons, do not accept a key=
argument to customize the comparisons. This is inevitable, since the functions operate in-place on a plain list of the items: they have nowhere to “stash away” custom comparison keys computed once and for all.
When you need both heap functionality and custom comparisons, you can apply the good old decorate-sort-undecorate (DSU) idiom (which used to be crucial to optimize sorting in old versions of Python, before the key=
functionality was introduced).
The DSU idiom, as applied to heapq
, has the following components:
Decorate: Build an auxiliary list A
where each item is a tuple starting with the sort key and ending with the item of the original list L
.
Call heapq
functions on A
, typically starting with heapq.heapify(A)
.
Undecorate: When you extract an item from A
, typically by calling heapq.heappop(A)
, return just the last item of the resulting tuple (which corresponds to an item of the original list L
).
When you add an item to A
by calling heapq.heappush(
, decorate the actual item you’re inserting into a tuple starting with the sort key.A
, item)
This sequencing is best wrapped up in a class, as in this example:
import
heapq
class
KeyHeap
(
object
):
def
__init__
(
self
,
alist
,
key
):
self
.
heap
=
[
(
key
(
o
),
i
,
o
)
for
i
,
o
in
enumerate
(
alist
)]
heapq
.
heapify
(
self
.
heap
)
self
.
key
=
key
if
alist
:
self
.
nexti
=
self
.
heap
[
-
1
][
1
]
+
1
else
:
self
.
nexti
=
0
def
__len__
(
self
):
return
len
(
self
.
heap
)
def
push
(
self
,
o
):
heapq
.
heappush
(
self
.
heap
,
(
self
.
key
(
o
),
self
.
nexti
,
o
))
self
.
nexti
+=
1
def
pop
(
self
):
return
heapq
.
heappop
(
self
.
heap
)[
-
1
]
In this example, we use an increasing number in the middle of the decorated tuple (after the sort key, before the actual item) to ensure that actual items are never compared directly, even if their sort keys are equal (this semantic guarantee is an important aspect of the key=
argument’s functionality to sort
and the like).
When you write a Python program meant to be run from the command line (or from a “shell script” in Unix-like systems or a “batch file” in Windows), you often want to let the user pass to the program, on the command line, command-line arguments (including command-line options, which by convention are usually arguments starting with one or two dash characters). In Python, you can access the arguments as sys.argv
, an attribute of module sys
holding those arguments as a list of strings (sys.argv[0]
is the name by which the user started your program; the arguments are in sublist sys.argv[1:]
). The Python standard library offers three modules to process those arguments; we only cover the newest and most powerful one, argparse
, and we only cover a small, core subset of argparse
’s rich functionality. See the online reference and tutorial for much, much more.
ArgumentParser |
|
Given an instance ap
of ArgumentParser
, prepare it by one or more calls to ap.add_argument
; then, use it by calling ap
.parse_args()
without arguments (so it parses sys.argv
): the call returns an instance of argparse.Namespace
, with your program’s args and options as attributes.
add_argument
has a mandatory first argument: either an identifier string, for positional command-line arguments, or a flag name, for command-line options. In the latter case, pass one or more flag names; an option often has both a short name (one dash, then a single character) and a long name (two dashes, then an identifier).
After the positional arguments, pass to add_argument
zero or more named arguments to control its behavior. Here are the common ones:
action |
What the parser does with this argument. Default: |
choices |
A set of values allowed for the argument (parsing the argument raises an exception if the value is not among these); default, no constraints. |
default |
Value to use if the argument is not present; default, |
dest |
Name of the attribute to use for this argument; default, same as the first positional argument stripped of dashes. |
help |
A short string mentioning the argument’s role, for help messages. |
nargs |
Number of command-line arguments used by this logical argument (by default, |
type |
A callable accepting a string, often a type such as |
Here’s a simple example of argparse
—save this code in a file called greet.py:
import
argparse
ap
=
argparse
.
ArgumentParser
(
description
=
'Just an example'
)
ap
.
add_argument
(
'who'
,
nargs
=
'?'
,
default
=
'World'
)
ap
.
add_argument
(
'--formal'
,
action
=
'store_true'
)
ns
=
ap
.
parse_args
()
if
ns
.
formal
:
greet
=
'Most felicitous salutations, o
{}
.'
else
:
greet
=
'Hello,
{}
!'
(
greet
.
format
(
ns
.
who
))
Now, python greet.py
prints Hello, World!
, while python greet.py --formal Cornelia
prints Most felicitous salutations, o Cornelia.
The itertools
module offers high-performance building blocks to build and manipulate iterators. To handle long processions of items, iterators are often better than lists, thanks to iterators’ intrinsic “lazy evaluation” approach: an iterator produces items one at a time, as needed, while all items of a list (or other sequence) must be in memory at the same time. This approach even makes it feasible to build and use unbounded iterators, while lists must always have finite numbers of items (since any machine has a finite amount of memory).
Table 7-5 covers the most frequently used attributes of itertools
; each of them is an iterator type, which you call to get an instance of the type in question, or a factory function behaving similarly. See the itertools documentation in the online Python docs for more itertools
attributes, including combinatoric generators for permutations, combinations, and Cartesian products, as well as a useful taxonomy of itertools
attributes.
The online docs also offer recipes, ways to combine and use itertools
attributes. The recipes assume you have from itertools import *
at the top of your module; this is not recommended use, just an assumption to make the recipes’ code more compact. It’s best to import itertools as it
, then use references such as it.something
rather than the more verbose itertools.something
.
chain |
Yields items from the first argument, then items from the second argument, and so on until the end of the last argument, just like the generator expression:
|
chain.from_iterable |
Yields items from the iterables in the argument, in order, just like the genexp:
|
compress |
Yields each item from
|
count |
Yields consecutive integers starting from
|
cycle |
Yields each item of
|
dropwhile |
Drops the
|
groupby |
Another way of looking at the groups For example, suppose that, given a
|
ifilter |
Yields those items of
|
ifilterfalse |
Yields those items of
|
imap |
Yields the results of
|
islice |
Yields items of
|
izip |
Yields tuples with one corresponding item from each of the |
izip_longest |
Yields tuples with one corresponding item from each of the |
repeat |
Repeatedly yields
When
|
starmap |
Yields
|
takewhile |
Yields items from
|
tee |
Returns a tuple of |
We have shown equivalent generators and genexps for many attributes of itertools
, but it’s important to remember the sheer speed of itertools
types. To take a trivial example, consider repeating some action 10 times:
for
_
in
itertools
.
repeat
(
0
,
10
):
pass
This turns out to be about 10 to 20 percent faster, depending on Python release and platform, than the straightforward alternative:
for
_
in
range
(
10
):
pass