This chapter concludes Part III with a look at techniques and tools used for documenting Python code. Although Python code is designed to be readable in general, a few well-placed human-readable comments can do much to help others understand the workings of your programs. To support comments, Python includes both syntax and tools to make documentation easier. Although this is something of a tools-related concept, the topic is presented here, partly because it involves Python’s syntax model, and partly as a resource for readers struggling to understand Python’s toolset. As usual, this chapter ends with pitfalls and exercises.
By this point in the book you’re probably starting to realize that Python comes with an awful lot of prebuilt functionality—built-in functions, exceptions, predefined object attributes, standard library modules, and more. Moreover we’ve really only scratched the surface of each of these categories.
One of the first questions that bewildered beginners often ask is: how do I find information on all the built-in tools? This section provides hints on the various documentation sources available in Python. It also presents documentation strings and the PyDoc system that makes use of them. These topics are somewhat peripheral to the core language itself, but become essential knowledge as soon as your code reaches the level of the examples and exercises in this chapter.
As summarized in Table 11-1, there are a variety of places to look for information in Python, with generally increasing verbosity. Since documentation is such a crucial tool in practical programming, let’s look at each of these categories.
Form |
Role |
|
In-file documentation |
The dir function |
Lists of attributes available on objects |
Docstrings |
In-file documentation attached to objects |
PyDoc: The help function |
Interactive help for objects |
PyDoc: HTML reports |
Module documentation in a browser |
Standard manual set |
Official language and library descriptions |
Web resources |
Online tutorial, examples, and so on |
Published books |
Commercially-available reference texts |
Hash-mark
comments
are the most basic way to document your code. All the text following
a #
(that is not inside a string literal) is
simply ignored by Python. Because of that, this provides a place for
you to write and read words meaningful to programmers. Such comments
are only accessible in your source files; to code comments that are
more widely available, use docstrings.
The built-in dir
function is an easy way to grab a list that shows all the attributes
available inside an object (i.e., its methods, and simple data
items). It can be called with any object that has atributes. For
example, to find out what’s available in the
standard library’s sys
module,
import it and pass to dir
:
>>>import sys
>>>dir(sys)
['__displayhook__', '__doc__', '__excepthook__', '__name__', '__stderr__', '__stdin__', '__stdout__', '_getframe', 'argv', 'builtin_module_names', 'byteorder', 'copyright', 'displayhook', 'dllhandle', 'exc_info', 'exc_type', 'excepthook', ...more names ommitted...]
Only some of the many names are displayed; run these statements on
your machine to see the full list. To find out what attributes are
provided in built-in object types, run dir
on a
literal of that type. For example, to see list and string attributes,
you can pass empty objects:
>>>dir([ ])
['__add__', '__class__', ...more... 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] >>>dir('')
['__add__', '__class__', ...more... 'capitalize', 'center', 'count', 'decode', 'encode', 'endswith', 'expandtabs', 'find', 'index', 'isalnum', 'isalpha', 'isdigit', 'islower', 'isspace', 'istitle', 'isupper', 'join', 'ljust', ...more names ommitted...]
dir
results for built-in types include a set of
attributes that are related to the implementation of the type
(technically, operator overloading methods); they all begin and end
with double underscores to make them distinct, and can be safely
ignored at this point in the book, so they are not shown here.
Incidentally, you can achieve the same effect by passing a type name
to dir
instead of a literal:
>>>dir(str) == dir('') # Same result as prior example
1 >>>dir(list) == dir([ ])
1
This works, because functions like str
and
list
that were once type converters are actually
names of types; calling them invokes their constructor to generate an
instance of that type. More on constructors and operator overloading
methods when we meet classes in Part VI.
The dir
function serves as a sort of
memory-jogger—it provides a list of attribute names, but does
not tell you anything about what those names mean. For such extra
information, we need to move on to the next topic.
Besides #
comments, Python supports documentation
that is retained at runtime for inspection, and automatically
attached to objects. Syntactically, such comments are coded as
strings at the top of module files, and the top of both function and
class statements, before any other executable code. Python
automatically stuffs the string, known as a
docsting, into the __doc__
attribute of the corresponding object.
For example, consider the following file,
docstrings.py. Its docstrings appear at the
beginning of the file, and at the start of a function and class
within it. Here, we use triple-quoted block strings for multiline
comments in the file and function, but any sort of string will work.
We haven’t studied the def
or
class
statements yet, so ignore everything about
them, except the strings at their tops:
""" Module documentation Words Go Here """ spam = 40 def square(x): """ function documentation can we have your liver then? """ return x **2 class employee: "class documentation" pass print square(4) print square.__doc__
The whole point of this documentation protocol is that your comments
are retained for inspection in __doc__
attributes, after the file is imported:
>>>import docstrings
16 function documentation can we have your liver then? >>>print docstrings.__doc__
Module documentation Words Go Here >>>print docstrings.square.__doc__
function documentation can we have your liver then? >>>print docstrings.employee.__doc__
class documentation
Here, after importing, we display the docstrings associated with the
module and its objects, by printing their __doc__
attributes, where Python has saved the text. Note that
you will generally want to explicitly say print
to
docstrings; otherwise, you’ll get a single string
with embedded newline characters.
You can also attach docstrings to methods of classes (covered later),
but because these are just def
statements nested
in a class, they’re not a special case. To fetch the
docstring of a method function inside a class within a module, follow
the path and go through the class: module.class.method.__doc__
(see the example of method docstrings in Chapter 22).
There is no broad standard a bout what should go into the text of a docstring (although some companies have internal standards). There have been various mark-up language and template proposals (e.g., HTML), but they seem to have not caught on in the Python world.
This is probably related to the priority of documentation among programmers in general. Usually, if you get any comments in a file at all, you count yourself lucky; asking programmers to hand-code HTML or other formats in their comments seems unlikely to fly. Of course, we encourage you to document your code liberally.
It turns out that built
modules and objects in Python use similar techniques to attach
documentation above and beyond the attribute lists returned by
dir
. For example, to see actual words that give a
human readable description of a built-in module, import and print its
__doc__
string:
>>>import sys
>>>print sys.__doc__
This module provides access to some objects used or maintained by the interpreter and to ...more text ommitted... Dynamic objects: argv -- command line arguments; argv[0] is the script pathname if known path -- module search path; path[0] is the script directory, else '' modules -- dictionary of loaded modules ...more text ommitted...
Similarly, functions, classes, and methods within built-in modules
have attached words in their __doc__
attributes
as well:
>>> print sys.getrefcount.__doc__
getrefcount(object) -> integer
Return the current reference count for the object.
...more text ommitted...
In addition, you can read about built-in functions via their docstrings:
>>>print int.__doc__
int(x[, base]) -> integer Convert a string or number to an integer, if possible. ...more text ommitted... >>>print open.__doc__
file(name[, mode[, buffering]]) -> file object Open a file. The mode can be 'r', 'w' or 'a' for reading ...more text ommitted...
The docstring technique proved to be so useful that Python ships with a tool that makes them even easier to display. The standard PyDoc tool is Python code that knows how to extract and format your docstrings, together with automatically extracted structural information, into nicely arranged reports of various types.
There are a variety of ways to launch
PyDoc,
including command-line script options. Perhaps the two most prominent
Pydoc interfaces are the built-in help
function,
and the PyDoc GUI/HTML interface. The newly introduced
help
function invokes PyDoc to generate a simple
textual report (which looks much like a manpage on Unix-like
systems):
>>>import sys
>>>help(sys.getrefcount)
Help on built-in function getrefcount: getrefcount(...) getrefcount(object) -> integer Return the current reference count for the object. ...more ommitted...
Note that you do not have to import sys
in order
to call help
, but you do have to import
sys
to get help on sys
. For
larger objects such as modules and classes, the
help
display is broken down into multiple
sections, a few of which are shown here. Run this interactively to
see the full report.
>>> help(sys)
Help on built-in module sys:
NAME
sys
FILE
(built-in)
DESCRIPTION
This module provides access to some objects used
or maintained by the interpreter and to functions
...more ommitted...
FUNCTIONS
__displayhook__ = displayhook(...)
displayhook(object) -> None
Print an object to sys.stdout and also save it
...more ommitted...
DATA
__name__ = 'sys'
__stderr__ = <open file '<stderr>', mode 'w' at 0x0082BEC0>
...more ommitted...
Some of the information in this report is docstrings, and some of it
(e.g., function call patterns) is structural information that Pydoc
gleans automatically by inspecting objects’
internals. You can also use help
on built-in
functions, methods, and types. To get help for a built-in type, use
the type name (e.g., dict
for dictionary,
str
for string, list
for list);
you’ll get a large display that describes all the
methods available for that type:
>>>help(dict)
Help on class dict in module __builtin__: class dict(object) | dict( ) -> new empty dictionary. ...more ommitted... >>>help(str.replace)
Help on method_descriptor: replace(...) S.replace (old, new[, maxsplit]) -> string Return a copy of string S with all occurrences ...more ommitted... >>>help(ord)
Help on built-in function ord: ord(...) ord(c) -> integer Return the integer ordinal of a one-character string.
Finally, the help
function works just as well on
your modules as built-ins. Here it is reporting on the
docstrings.py
file coded in the prior section;
again, some of this is docstrings, and some is automatic by
structure:
>>>help(docstrings.square)
Help on function square in module docstrings: square(x) function documentation can we have your liver then? >>>help(docstrings.employee)
...more ommitted... >>>help(docstrings)
Help on module docstrings: NAME docstrings FILE c:python22docstrings.py DESCRIPTION Module documentation Words Go Here CLASSES employee ...more ommitted... FUNCTIONS square(x) function documentation can we have your liver then? DATA __file__ = 'C:\PYTHON22\docstrings.pyc' __name__ = 'docstrings' spam = 40
The help
function is nice
for grabbing documentation when
working interactively. For a more grandiose display, PyDoc also
provides a GUI interface (a simple, but portable Python/Tkinter
script), and can render its report in HTML page format, viewable in
any web browser. In this mode, PyDoc can run locally or as a remote
server, and reports contain automatically-created hyperlinks that
allow you to click your way through the documentation of related
components in your application.
To start PyDoc in this mode, you generally first launch the search
engine GUI captured in Figure 11-1. You can start
this by either selecting the Module Docs item in
Python’s Start button menu on Windows, or launching
the pydocgui
script in Python’s
tools directory. Enter the name of a module you’re
interested in knowing about, and press the Enter key; PyDoc will
march down your module import search path looking for references to
the module.
Once you’ve found a promising entry, select and
click “go to selected”; PyDoc
spawns a web browser on your machine to display the report rendered
in HTML format. Figure 11-2 shows information PyDoc
displays for the built-in glob
module.
Notice the hyperlinks in the Modules section of this page—click
these to jump to the PyDoc pages for related (imported) modules. For
larger pages, PyDoc also generates hyperlinks to sections within the
page. As for the help
function interface, the GUI
interface works on user-defined modules as well; Figure 11-3 shows the page generated for the
docstrings.py module file.
PyDoc can be customized and launched in various ways. The main thing to take away from this section is that PyDoc essentially gives you implementation reports “for free”—if you are good about using docstrings in your files, PyDoc does all the work of collecting and formatting them for display. PyDoc also provides an easy way to access a middle level of documentation for built-in tools—its reports are more useful than raw attribute lists, and less exhaustive than the standard manuals.
For the complete and most up-to-date description of the Python language and its tool set, Python’s standard manuals stand ready to serve. Python’s manuals ship in HTML format and are installed with the Python system on Windows—they are available in your Start button’s menu for Python, and can also be opened from the Help menu within IDLE. You can also fetch the manual set separately at http://www.python.org in a variety of formats, or read them online at that site (follow the Documentation link).
When opened, the HTML format of the manuals displays a root page like that in Figure 11-4. The two most important entries here are most likely the Library Reference (which documents built-in types, functions, exceptions, and standard library modules) and the Language Reference (which provides a formal description of language-level details). The tutorial listed on this page also provides a brief introduction for newcomers.
At http://www.python.org you’ll find links to various tutorials, some of which cover special topics or domains. Look for the Documentation and Newbies (i.e., newcomers) links. This site also lists non-English Python resources.
Finally, you can today choose from a collection of reference books for Python. In general, books tend to lag behind the cutting edge of Python changes, partly because of the work involved in writing, and partly because of the natural delays built in to the publishing cycle. Usually, by the time a book comes out, it’s six or more months behind the current Python state. Unlike standard manuals, books are also generally not free.
For many, the convenience and quality of a professionally published text is worth the cost. Moreover, Python changes so slowly that books are usually still relevent years after they are published, especially if their authors post updates on the Web. See the Preface for more pointers on Python books.
Before the programming exercises for this part of the book, here are some of the most common mistakes beginners make when coding Python statements and programs. You’ll learn to avoid these once you’ve gained a bit of Python coding experience; but a few words might help you avoid falling into some of these traps initially.
Don’t forget the colons. Don’t forget to type a : at the
end of compound statement headers (the first line of an
if
, while
,
for
, etc.). You probably will at first anyhow (we
did too), but you can take some comfort in the fact that it will soon
become an unconscious habit.
Start in column 1. Be sure to start top-level (unnested) code in column 1. That includes unnested code typed into module files, as well as unnested code typed at the interactive prompt.
Blank lines matter at the interactive prompt. Blank lines in
compound statements are always ignored in module files, but, when
typing code, end the statement at the interactive prompt. In other
words, blank lines tell the interactive command line that
you’ve finished a compound statement; if you want to
continue, don’t hit the Enter key at the
..
. prompt until you’re really
done.
Indent consistently. Avoid mixing tabs and spaces in the indentation of a block, unless you know what your text editor system does with tabs. Otherwise, what you see in your editor may not be what Python sees when it counts tabs as a number of spaces. It’s safer to use all tabs or all spaces for each block.
Don’t code C in Python. A note to C/C++ programmers: you don’t need to type
parentheses around tests in if
and
while
headers (e.g., if
(X==1)
:); you can if you like (any expression can be
enclosed in parentheses), but they are fully superfluous in this
context. Also, do not terminate all your statements with a semicolon;
it’s technically legal to do this in Python as well,
but is totally useless, unless you’re placing more
than one statement on a single line (the end of a line normally
terminates a statement). And remember, don’t embed
assignment statements in while
loop tests, and
don’t use { }
around blocks
(indent your nested code blocks consistently instead).
Use simple for
loops instead of
while
or range
. A simple for
loop (e.g., for x in
seq
:) is almost always simpler to code, and quicker to run
than a while
or range
-based
counter loop. Because Python handles indexing internally for a simple
for
, it can sometimes be twice as fast as the
equivalent while
.
Don’t expect results from functions that change
objects
in-place
. In-place change operations like the list.append( )
and list.sort( )
methods of Chapter 6 do not return a value (other than
None
); call them
without assigning the result. It’s not uncommon for
beginners to say something like
mylist=mylist.append(X)
to try to get the result
of an append
; instead, this assigns
mylist
to None
, rather than the
modified list (in fact, you’ll lose a reference to
the list altogether).
A more devious example of this pops up when trying to step through
dictionary items in sorted fashion. It’s fairly
common to see this sort of code: for k in D.keys( ).sort(
)
:. This almost works: the keys
method
builds a keys list, and the sort
method orders
it—but since the sort
method returns
None
, the loop fails because it is ultimately a
loop over None
(a nonsequence). To code this
correctly, split the method calls out to statements: Ks =
D.keys( )
, then Ks.sort( )
, and finally
for k in Ks
:. This, by the way is one case where
you’ll still want to call the
keys
method explicitly for looping, instead of
relying on the dictionary iterators.
Always use parenthesis to call a
function. You must add parentheses after a function name to call it, whether it
takes arguments or not (e.g., use function( )
, not
function
). In Part IV,
we’ll see that functions are simply objects that
have a special operation—a call, that you trigger with the
parentheses.
In classes, this seems to occur most often with files.
It’s common to see beginners type
file.close
to close a file, rather than
file.close( )
; because it’s legal
to reference a function without calling it, the first version with no
parenthesis succeeds silently, but does not close the file!
Don’t use extensions or paths in imports and reloads. Omit directory paths and file suffixes in import statements (e.g.,
say import mod
, not import
mod.py
). (We met module basics in Chapter 6, and will continue studying them in Part V.) Because modules may have other suffixes
besides .py (a .pyc, for
instance), hardcoding a particular suffix is not only illegal syntax,
it doesn’t make sense. And platform-specific
directory path syntax comes from module search path settings, not the
import statement.
Now that you know how to code basic program logic, the exercises ask you to implement some simple tasks with statements. Most of the work is in exercise 4, which lets you explore coding alternatives. There are always many ways to arrange statements, and part of learning Python is learning which arrangements work better tHan others.
See Section B.3 for the solutions.
Coding basic loops.
Write a for
loop that prints the ASCII code of
each character in a string named S
. Use the
built-in function ord(character)
to convert each
character to an ASCII integer. (Test it interactively to see how it
works.)
Next, change your loop to compute the sum of the ASCII codes of all characters in a string.
Finally, modify your code again to return a new list that contains
the ASCII codes of each character in the string. Does this expression
have a similar effect—map(ord,S)
? (Hint: see
Part IV.)
Backslash characters. What happens on your machine when you type the following code interactively?
for i in range(50): print 'hello %d a' % i
Beware that if run outside of the IDLE interface, this example may beep at you, so you may not want to run it in a crowded lab. IDLE prints odd characters instead (see the backslash escape characters in Table 5-2).
Sorting dictionaries. In Chapter 6, we saw that dictionaries are unordered
collections. Write a for
loop that prints a
dictionary’s items in sorted (ascending) order.
Hint: use the dictionary keys
and list
sort
methods.
Program logic alternatives. Consider the
following code, which uses a while
loop and
found
flag to search a list of powers of 2, for
the value of 2 raised to the 5th power (32). It’s
stored in a module file called power.py.
L = [1, 2, 4, 8, 16, 32, 64]
X = 5
found = i = 0
while not found and i < len(L):
if 2 ** X == L[i]:
found = 1
else:
i = i+1
if found:
print 'at index', i
else:
print X, 'not found'
C:ook ests> python power.py
at index 5
As is, the example doesn’t follow normal Python coding techniques. Follow the steps below to improve it. For all the transformations, you may type your code interactively or store it in a script file run from the system command line (using a file makes this exercise much easier).
First, rewrite this code with a while
loop
else
, to eliminate the found
flag and final if
statement.
Next, rewrite the example to use a for
loop with
an else
, to eliminate the explicit list indexing
logic. Hint: to get the index of an item, use the list
index
method (L.index(X)
returns the offset of the first X
in list
L
).
Next, remove the loop completely by rewriting the examples with a
simple in
operator membership expression. (See
Chapter 6 for more details, or type this to test:
2 in [1,2,3]
.)
Finally, use a for
loop and the list
append
method to generate the powers-of-2 list
(L
), instead of hard-coding a list literal.
Deeper thoughts: (1) Do you think it would improve performance to
move the 2**X
expression outside the loops? How
would you code that? (2) As we saw in Exercise 1, Python also
includes a map(function, list)
tool that can
generate the powers-of-2 list too: map(lambda x: 2**x,
range(7))
. Try typing this code interactively;
we’ll meet lambda
more formally
in Chapter 14.