At this point, we have covered the syntax of Python, its basic data types, and many of our favorite functions in the Python library. This chapter assumes that all the basic components of the language are at least understood and presents some ways in which Python is, in addition to being elegant and “cool,” just plain useful. We present a variety of tasks common to Python programmers. These tasks are grouped by categories—data structure manipulations, file manipulations, etc.
One of Python’s greatest features is that it provides the list, tuple, and dictionary built-in types. They are so flexible and easy to use that once you’ve grown used to them, you’ll find yourself reaching for them automatically.
Due to Python’s reference management scheme, the
statement a = b
doesn’t make a copy of the
object referenced by b
; instead, it makes a new
reference to that object. Sometimes a new copy of an object, not just
a shared reference, is needed. How to do this depends on the type of
the object in question. The simplest way to make copies of
lists
and
tuples
is somewhat odd. If myList
is a list, then to make
a copy of it, you can do:
newList = myList[:]
which you can read as “slice from beginning to end,”
since you’ll remember from Chapter 2, that
the default index for the start of a slice is the beginning of the
sequence (0), and the default index for the end of a slice is the end
of sequence. Since tuples support the same slicing operation as
lists, this same technique can also copy tuples. Dictionaries, on the
other hand, don’t support slicing. To make a copy of a
dictionary
myDict
, you can use:
newDict = {} for key in myDict.keys(): newDict[key] = myDict[key]
This is such a common task that a new method was added to the
dictionary object in Python 1.5, the copy()
method, which performs this task. So the preceding code can
be replaced with the single statement:
newDict = myDict.copy()
Another common dictionary operation is also now a standard dictionary
feature. If you have a dictionary oneDict
, and
want to update it with the contents of a different dictionary
otherDict
, simply type
oneDict.update(otherDict)
. This is the equivalent
of:
for key in otherDict.keys(): oneDict[key] = otherDict[key]
If oneDict
shared some keys with
otherDict
before the update()
operation, the old values associated with the keys in
oneDict
are obliterated by the update. This may be
what you want to do (it usually is, which is why this behavior was
chosen and why it was called “update”). If it
isn’t, the right thing to do might be to complain (raise an
exception), as in:
def mergeWithoutOverlap(oneDict, otherDict): newDict = oneDict.copy() for key in otherDict.keys(): if key in oneDict.keys(): raise ValueError, "the two dictionaries are sharing keys!" newDict[key] = otherDict[key] return newDict
or, alternatively, combine the values of the two dictionaries, with a tuple, for example:
def mergeWithOverlap(oneDict, otherDict): newDict = oneDict.copy() for key in otherDict.keys(): if key in oneDict.keys(): newDict[key] = oneDict[key], otherDict[key] else: newDict[key] = otherDict[key] return newDict
To illustrate the differences between the preceding three algorithms, consider the following two dictionaries:
phoneBook1 = {'michael': '555-1212', 'mark': '554-1121', 'emily': '556-0091'} phoneBook2 = {'latoya': '555-1255', 'emily': '667-1234'}
If phoneBook1
is possibly out of date, and
phoneBook2
is more up to date but less complete,
the right usage is probably
phoneBook1.update(phoneBook2)
. If the two
phoneBooks are supposed to have nonoverlapping sets of keys, using
newBook = mergeWithoutOverlap(phoneBook1, phoneBook2)
lets you know if that assumption is wrong.
Finally, if one is a set of home phone numbers and the other a set of
office phone numbers, chances are newBook
=
mergeWithOverlap(phoneBook1, phoneBook2)
is what you want, as long as the subsequent code
that uses newBook
can deal with the fact that
newBook['emily']
is the tuple
('556-0091',
'667-1234')
.
Back to making copies: the [:]
and
.copy()
tricks will get you copies in 90% of the
cases. If you are writing functions that, in true Python spirit, can
deal with arguments of any type, it’s sometimes necessary to
make copies of X, regardless of what X is. In
comes the
copy
module. It provides two functions,
copy
and deepcopy
. The first is
just like the [:]
sequence slice operation or the
copy
method of dictionaries. The second is more
subtle and has to do with deeply nested structures (hence the term
deepcopy). Take the example of copying a list
listOne
by slicing it from beginning to end using
the [:]
construct. This technique makes a new list
that contains references to the same objects contained in the
original list. If the contents of that original list are
immutable objects, such as numbers or strings, the copy is as good as
a “true” copy. However, suppose that the first element in
listOne
is itself a dictionary (or any other
mutable object). The first element of the copy of
listOne
is a new reference to the
same dictionary. So if you then modify that
dictionary, the modification is evident in both
listOne
and the copy of
listOne
. An example makes it much clearer:
>>>import copy
>>>listOne = [{"name": "Willie", "city": "Providence, RI"}, 1, "tomato", 3.0]
>>>listTwo = listOne[:]
# or listTwo=copy.copy(listOne) >>>listThree = copy.deepcopy(listOne)
>>>listOne.append("kid")
>>>listOne[0]["city"] = "San Francisco, CA"
>>>print listOne, listTwo, listThree
[{'name': 'Willie', 'city': 'San Francisco, CA'}, 1, 'tomato', 3.0, 'kid'] [{'name': 'Willie', 'city': 'San Francisco, CA'}, 1, 'tomato', 3.0] [{'name': 'Willie', 'city': 'Providence, RI'}, 1, 'tomato', 3.0]
As you can see, modifying listOne
directly
modified only listOne
. Modifying the first entry
of the list referenced by
listOne
led to changes in
listTwo
, but not in listThree
;
that’s the difference between a shallow copy
([:]
) and a deepcopy. The copy
module functions know how to copy all the built-in types that are
reasonably copyable,[61] including classes and instances.
In Chapter 2, you saw that lists have a
sort method that
does an in-place sort. Sometimes you want to iterate over the sorted
contents of a list, without disturbing the contents of this list. Or
you may want to list the sorted contents of a tuple. Because
tuples are immutable, an operation such as
sort
, which modifies it in place, is not allowed.
The only solution is to make a list copy of the elements, sort the
list copy, and work with the sorted copy, as in:
listCopy = list(myTuple) listCopy.sort() for item in listCopy: print item # or whatever needs doing
This solution is also the way to deal with data structures that have no inherent order, such as dictionaries. One of the reasons that dictionaries are so fast is that the implementation reserves the right to change the order of the keys in the dictionary. It’s really not a problem, however, given that you can iterate over the keys of a dictionary using an intermediate copy of the keys of the dictionary:
keys = myDict.keys() # returns an unsorted list of # the keys in the dict keys.sort() for key in keys: # print key, value pairs print key, myDict[key] # sorted by key
The sort
method on lists uses the standard Python
comparison scheme. Sometimes, however, that scheme isn’t
what’s needed, and you need to sort according to some other
procedure. For example, when sorting a list of words, case (lower
versus UPPER) may not be significant. The standard comparison of text
strings, however, says that all uppercase letters “come
before” all lowercase letters, so 'Baby'
is
“less than” 'apple'
but
'baby'
is “greater than”
'apple'
. In order to do a
case-independent
sort, you need to define a comparison function that takes two
arguments, and returns -1
, 0
,
or 1
depending on whether the first argument is
smaller than, equal to, or greater than the second argument. So, for
our case-independent sorting, you can use:
>>> def caseIndependentSort(something, other):
... something, other = string.lower(something), string.lower(other)
... return cmp(something, other)
... >>>testList = ['this', 'is', 'A', 'sorted', 'List']
>>>testList.sort()
>>>print testList
['A', 'List', 'is', 'sorted', 'this'] >>>testList.sort(caseIndependentSort)
>>>print testList
['A', 'is', 'List', 'sorted', 'this']
We’re using the built-in function cmp
, which
does the hard part of figuring out that 'a'
comes
before 'b'
, 'b'
before
'c'
, etc. Our sort function simply lowercases both
items and sorts the lowercased versions, which is one way of making
the comparison case-independent. Also note that the lowercasing
conversion is local to the comparison function, so the elements in
the list aren’t modified by the sort.
What about
randomizing
a sequence, such as a list of lines? The easiest way to randomize a
sequence is to repeatedly use the choice
function
in the random
module, which returns a random
element from the sequence it receives as an argument.[62] In
order to avoid getting the same line multiple times, remember to
remove the chosen item. When manipulating a list object, use the
remove
method:
while myList: # will stop looping when myList is empty element = random.choice(myList) myList.remove(element) print element,
If you need to randomize a nonlist object, it’s usually easiest to convert that object to a list and randomize the list version of the same data, rather than come up with a new strategy for each data type. This might seem a wasteful strategy, given that it involves building intermediate lists that might be quite large. In general, however, what seems large to you probably won’t seem so to the computer, thanks to the reference system. Also, consider the time saved by not having to come up with a different strategy for each data type! Python is designed to save time; if that means running a slightly slower or bigger program, so be it. If you’re handling enormous amounts of data, it may be worthwhile to optimize. But never optimize until the need for optimization is clear; that would be a waste of time.
The last point about not reinventing the wheel is especially true when it comes to data structures. For example, Python lists and dictionaries might not be the lists and dictionaries or mappings you’re used to, but you should avoid designing your own data structure if these structures will suffice. The algorithms they use have been tested under wide ranges of conditions, and they’re fast and stable. Sometimes, however, the interface to these algorithms isn’t convenient for a particular task.
For example, computer-science textbooks often describe algorithms in
terms of other data structures such as queues and stacks. To use
these algorithms, it may make sense to come up with a data structure
that has the same methods as these data structures (such as
pop
and push
for stacks or
enqueue
/dequeue
for queues).
However, it also makes sense to reuse the built-in list type in the
implementation of a stack. In other words, you need something that
acts like a stack but is based on a list. The easiest solution is to
use a class wrapper around a list. For a minimal stack
implementation, you can do this:
class Stack: def __init__(self, data): self._data = list(data) def push(self, item): self._data.append(item) def pop(self): item = self._data[-1] del self._data[-1] return item
The following is simple to write, to understand, to read, and to use:
>>> thingsToDo = Stack(['write to mom', 'invite friend over', 'wash the kid'])
>>> thingsToDo.push('do the dishes')
>>> print thingsToDo.pop()
do the dishes
>>> print thingsToDo.pop()
wash the kid
Two standard Python naming
conventions are used in the
Stack
class above. The first is that class names
start with an uppercase letter, to distinguish them from functions.
The other is that the _data
attribute starts with
an underscore. This is a half-way point between public attributes
(which don’t start with an underscore), private attributes
(which start with two underscores; see Chapter 6),
and Python-reserved identifiers (which both start and end with two
underscores). What it means is that _data
is an
attribute of the class that shouldn’t be needed by clients of
the class. The class designer expects such
“pseudo-private” attributes to be used only by the class
methods and by the methods of any eventual subclass.
The
Stack
class
presented earlier does its minimal job just fine. It assumes a fairly
minimal definition of what a stack is, specifically, something that
supports just two operations, a push
and a
pop
. Quickly, however, you find that some of the
features of lists are really nice, such as the ability to iterate
over all the elements using the for...in...
construct. This can be done by reusing existing code. In this case,
you should use the
UserList
class defined in the
UserList
module as a class from which the
Stack
can be derived. The library also includes a
UserDict
module that is a class wrapper around a
dictionary. In general, they are there to be specialized by
subclassing. In our case:
# import the UserList class from the UserList module from UserList import UserList # subclass the UserList class class Stack(UserList): push = UserList.append def pop(self): item = self[-1] # uses __getitem__ del self[-1] return item
This Stack
is a subclass of the
UserList
class. The UserList
class implements the behavior of the []
brackets by defining the special __getitem
__ and
__delitem
_ _
methods among
others, which is why the code in pop
works. You
don’t need to define your own __init
__
method because UserList
defines a perfectly good
default. Finally, the push
method is defined just
by saying that it’s the same as
UserList
’s
append
method. Now we can do list-like things
as well as stack-like things:
>>>thingsToDo = Stack(['write to mom', 'invite friend over', 'wash the kid'])
>>>print thingsToDo
# inherited from UserList ['write to mom', 'invite friend over', 'wash the kid'] >>>thingsToDo.pop()
'wash the kid' >>>thingsToDo.push('change the oil')
>>>for chore in thingsToDo:
# we can also iterate over thecontents
...print chore
# as "for .. in .." uses __getitem__ ... write to mom invite friend over change the oil
As this book was being written, Guido van Rossum announced that in
Python 1.5.2 (and subsequent versions), list objects now have an
additional method called pop
, which behaves just
like the one here. It also has an optional argument that specifies
what index to use to do the pop (with the default being the last
element in the list).
[61] Some objects don’t qualify as “reasonably copyable,” such as modules, file objects, and sockets. Remember that file objects are different from files on disk.
[62] The random
module provides many other useful
functions, such as the random
function, which
returns a random floating-point number between
and 1. Check a reference source for details.