This chapter begins our look at the Python module, the highest-level program organization unit, which packages program code and data for reuse. In concrete terms, modules usually correspond to Python program files (or C extensions). Each file is a module, and modules import other modules to use the names they define. Modules are processed with two new statements and one important built-in function:
import
Lets a client (importer) fetch a module as a whole
from
Allows clients to fetch particular names from a module
reload
Provides a way to reload a module’s code without stopping Python
We introduced module fundamentals in Chapter 3, and have been using them ever since. Part V begins by expanding on core module concepts, and then moves on to explore more advanced module usage. This first chapter begins with a general look at the role of modules in overall program structure. In the next and following chapters, we’ll dig into the coding details behind the theory.
Along the way, we’ll flesh out module details
we’ve omitted so far: reloads, the __name__
and __all__
attributes, package
imports, and so on. Because modules and classes are really just
glorified namespaces, we formalize namespace concepts here as well.
Modules provide an easy way to organize components into a system, by serving as packages of names. From an abstract perspective, modules have at least three roles:
As we saw in Chapter 3, modules let us save code in files permanently. Unlike code you type at the Python interactive prompt, which goes away when you exit Python, code in module files is persistent—it can be reloaded and rerun as many times as needed. More to the point, modules are a place to define names, or attributes, that may be referenced by external clients.
Modules are also the highest-level program organization unit in Python. Fundamentally, they are just packages of names. Modules seal up names into self-contained packages that avoid name clashes—you can never see a name in another file, unless you explicitly import it. In fact, everything “lives” in a module: code you execute and objects you create are always implicitly enclosed by a module. Because of that, modules are a natural tool for grouping system components.
From a functional perspective, modules also come in handy for implementing components that are shared across a system, and hence only require a single copy. For instance, if you need to provide a global object that’s used by more than one function or file, you can code it in a module that’s imported by many clients.
To truly understand the role of modules in a Python system, though, we need to digress for a moment and explore the general structure of a Python program.
So far in this book, we’ve sugar-coated some of the complexity in our descriptions of Python programs. In practice, programs usually are more than just one file; for all but the simplest scripts, your programs will take the form of multifile systems. And even if you can get by with coding a single file yourself, you will almost certainly wind up using external files that someone else has already written.
This section introduces the general architecture of Python programs—the way you divide a program into a collection of source files (a.k.a. modules), and link the parts into a whole. Along the way, we also define the central concepts of Python modules, imports, and object attributes.
Generally, a Python program consists of multiple text files containing Python statements. The program is structured as one main, top-level file, along with zero or more supplemental files known as modules in Python.
In a Python program, the top-level file contains the main flow of control of your program—the file you run to launch your application. The module files are libraries of tools, used to collect components used by the top-level file, and possibly elsewhere. Top-level files use tools defined in module files, and modules use tools defined in other modules. In Python, a file imports a module to gain access to the tools it defines. And the tools defined by a module are known as its attributes—variable names attached to objects such as functions. Ultimately, we import modules, and access their attributes to use their tools.
Let’s make this a bit more concrete. Figure 15-1 sketches the structure of a Python program composed of three files: a.py, b.py, and c.py. The file a.py is chosen to be the top-level file; it will be a simple text file of statements, which is executed from top to bottom when launched. Files b.py and c.py are modules; they are simple text files of statements as well, but are usually not launched directly. Rather, modules are normally imported by other files that wish to use the tools they define.
For instance, suppose file b.py in Figure 15-1 defines a function called
spam
, for external use. As we learned in Part IV, b.py would contain a
Python def
statement to generate the function,
which is later run by passing zero or more values in parenthesis
after the function’s name:
def spam(text): print text, 'spam'
Now, if a.py wants to use
spam
, it might contain Python statements such as
the following:
import b b.spam('gumby')
The first of these two, a Python import
statement,
gives file a.py access to everything defined in
file b.py. It roughly means:
“load file b.py (unless
it’s already loaded), and give me access to all its
attributes through name b
.”
import
(and as you’ll see later,
from
) statements execute and load another file at
runtime. In Python, cross-file module linking is not resolved until
such import
statements are executed.
The second of these statements calls the function
spam
defined in module b
using
object attribute notation. The code b.spam
means:
“fetch the value of name spam
that lives within object b
.” This
happens to be a callable function in our example, so we pass a string
in parenthesis ('gumby
'). If you actually type
these files and run a.py, the words
“gumby spam” are printed.
More generally, you’ll see the notation
object.attribute
throughout Python
scripts—most objects have useful attributes that are fetched
with the “.” operator. Some are
callable things like functions, and others are simple data values
that give object properties (e.g., a person’s name).
The notion of importing is also general throughout Python. Any file
can import tools from any other file. For instance, file
a.py may import b.py to
call its function, but b.py might also import
c.py in order to leverage different tools
defined there. Import chains can go as deep as you like: in this
example, module a
can import b
,
which can import c
, which can import
b
again, and so on.
Besides serving as a highest organization structure, modules (and
module packages, described in Chapter 17) are also
the highest level of code reuse in Python. By
coding components in module files, they become useful in both the
original program, as well as in any other program you may write. For
instance, if after coding the program in Figure 15-1
we discover that function b.spam
is a general
purpose tool, we can reuse it in a completely different program;
simply import file b.py again, from the other
program’s files.
Notice the rightmost portion of Figure 15-1. Some of the modules that your programs will import are provided by Python itself, not files you will code. Python automatically comes with a large collection of utility modules known as the Standard Library .
This collection, roughly 200 modules large at last count, contains platform independent support for common programming tasks: operating system interfaces, object persistence, text pattern matching, network and Internet scripting, GUI construction, and much more. None of these are part of the Python language itself, but can be used by importing the appropriate modules on any standard Python installation.
In this book, you will meet a few of the standard library modules in action in the examples, but for a complete look, you should browse the standard Python Library Reference Manual, available either with your Python installation (they are in IDLE and your Python Start button entry on Windows), or online at http://www.python.org.
Because there are so many modules, this is really the only way to get a feel for what tools are available. You can also find Python library materials in commercial books, but the manuals are free, viewable in any web browser (they ship in HTLM format), and updated each time Python is re-released.
The prior section talked about importing modules, without really explaining what happens when you do so. Since imports are at the heart of program structure in Python, this section goes into more detail on the import operation to make this process less abstract.
Some C programmers like to compare the Python module import operation
to a C #include
, but they really
shouldn’t—in Python, imports are not just
textual insertions of one file into another. They are really runtime
operations that perform three distinct steps the first time a file is
imported by a program:
Find the module’s file.
Compile it to byte-code (if needed).
Run the module’s code to build the objects it defines.
All three of these steps are only run the first time a module is imported during a program’s execution; later imports of the same module bypass all of these and simply fetch the already-loaded module object in memory. To better understand module imports, let’s explore each of these steps in turn.
First off, Python must locate the module file referenced by your
import statement. Notice the import statement in the prior
section’s example names the file without a
.py suffix and without its directory path. It
says just import b
, instead of something like
import c:dir1.py
. Import statements omit path
and suffix details like this on purpose; you can only list a simple
name.[1] Instead, Python uses a standard
module search path to locate the module file
corresponding to an import statement.
In many cases, you can rely on the automatic nature of the module import search path and need not configure this path at all. If you want to be able to import files across user-defined directory boundaries, though, you will need to know how the search path works, in order to customize it. Roughly, Python’s module search path is automatically composed as the concatenation of these major components:
The home directory of the top-level file.
PYTHONPATH
directories (if set).
Standard library directories.
The contents of any .pth files (if present).
The first and third of these are defined automatically. Because Python searches the concatenation of these from first to last, the second and fourth can be used to extend the module search path to include your own directories. Here is how Python uses each of these path components:
Python first looks for the imported file in the home directory. Depending on how you are launching code, this is either the directory containing your program’s top-level file, or the directory in which you are working interactively. Because this is always searched first, if a program is located entirely in a single directory, all its imports will work automatically, with no path configuration required.
PYTHONPATH
directoriesNext, Python searches all directories listed in your
PYTHONPATH
envronment variable setting, from left
to right (assuming you have set this at all). In brief,
PYTHONPATH
is simply set to a list of user-defined
and platform-specific names of directories that contain Python code
files. Add all the directories that you wish to be able to import
from; Python uses your setting to extend the module search path.
Because Python searches the home directory first, you only need to make this setting to import files across directory boundaries—that is, to import a file that is stored in a different directory than the file that imports it. In practice, you probably will make this setting once you start writing substantial programs. When you are first starting out, though, if you save all your module files in the directory that you are working in (i.e., the home directory), your imports will work without making this setting at all.
Next, Python will automatically search the directories where the
standard library modules are installed on your machine. Because these
are always searched, they normally do not need to be added to your
PYTHONPATH
.
Finally, a relatively new feature of Python allows users to add valid directories to the module search path by simply listing them, one per line, in a text file whose name ends in a .pth suffix (for “path”). These path configuration files are a somewhat advanced installation-related feature, which we will not discuss fully here.
In short, a text file of directory names, dropped in an appropriate
directory, can serve roughly the same role as the
PYTHONPATH
environment variable setting. For
instance, a file named myconfig.pth, may be
placed at the top level of the Python install directory on Windows
(e.g., in C:Python22
), to extend the module
search path. Python will add the directories listed on each line of
the file, from first to last, near the end of the module search path
list. Because they are based on files instead of shell settings, path
files can also apply to all users of an installation, instead of just
one user or shell.
This feature is more sophisticated than we will describe here. We
recommend that beginners use either PYTHONPATH
or
a single .pth file, and then only if you must
import across directories. See the Python library manual for more
details on this feature, especially its documentation for standard
library module site
.
See also Appendix A for examples of common ways to
extend your module search path with PYTHONPATH
or
.pth files on various platforms. Depending on
your platform, additonal directories may be automatically added to
the module search path as well. In fact, this description of the
module search path is accurate, but generic; the exact configuration
of the search path is prone to change over both platforms and Python
releases.
For instance, Python may add an entry for the current
working directory—the directory from which you
launched your program—in the search path, after the
PYTHONPATH
directories, and before standard
library entries. When launching from a command line, the current
working directory may not be the same as the home
directory of your top-level file—the directory where
your program file resides. (See Chapter 3 for more
on command lines.) Since the current working directory can vary each
time your program runs, you normally shouldn’t
depend on its value for import purposes.
If you want to see
how
the path is truly configured on your machine, you can always inspect
the module search path as it is known to Python, by printing the
built-in sys.path
list (that is, attribute
path
, of built-in module sys
).
This Python list of directory name strings is the actual search path;
on imports, Python searches each directory on this list, from left to
right.
Really, sys.path
is the
module search path. It is configured by Python at program startup,
using the four path components just described. Python automatically
merges any PYTHONPATH
and
.pth file path settings you’ve
made into this list, and always sets the first entry to identify the
home directory of the top-level file, possibly as an empty string.
Python exposes this list for two good reasons. First of all, it
provides a way to verify the search path settings
you’ve made—if you don’t see
your settings somewhere on this list, you need to recheck your work.
Secondly, if you know what you’re doing, this list
also provides a way for scripts to tailor their search paths
manually. As you’ll see later in this part, by
modifying the sys.path
list, you can modify the
search path for all future imports. Such changes only last for the
duration of the script, however; PYTHONPATH
and
.pth files are more permanent ways to modify the
path.[2]
Keep in mind
that
filename
suffixes (e.g., .py) are
omitted in import statements, intentionally. Python chooses the first
file it can find on the search path that matches the imported name.
For example, an import statement of the form import
b
, might load:
Source file b.py
Byte-code file b.pyc
A directory named b, for package imports
A C extension module (e.g., b.so
on Linux)
An in-memory image, for frozen executables
A Java class, in the Jython system
A zip file component, using the zipimport
module
Some standard library modules are actually coded in C. C extensions,
Jython, and package imports all extend imports beyond simple files.
To importers, though, the difference in loaded file type is
completely transparent, both when importing and fetching module
attributes. Saying import b
gets whatever module
b
is, according to your module search path, and
b.attr
fetches an item in the module, be that a
Python variable or a linked-in C function. Some standard modules we
will use in this book, for example, are coded in C, not Python; their
clients don’t have to care.
If you have both a b.py and a
b.so in different directories, Python will
always load the one on the first (leftmost) directory on your module
search path, during the left to right search of
sys.path
. But what happens if there is both a
b.py and b.so in the
same directory? Python follows a standard
picking order, but it is not guaranteed to stay the same over time.
In general, you should not depend on which type of file Python will
choose within a given directory—make your module names
distinct, or use module search path configuration to make module
selection more obvious. It is also possible to redefine much of what
an import operation does in Python, with what are known as
import hooks. These hooks can be used to make
imports do useful things such as load files from zip archives,
perform decryption, and so on (in fact, Python 2.3 includes a
zipimport
standard module, which allows files to
be directly imported from zip archives). Normally, though, imports
work as described in this section. Python also supports the notion of
.pyo optimized byte-code files, created and run
with the -O
Python command-line flag; because
these run only slightly faster than normal .pyc
files (typically 5% faster), they are infrequently used. The Psyco
system (see Chapter 2) provides more substantial
speedups.
After finding a source code file that matches an import statement according to the module search path, Python next compiles it to byte code, if necessary. (We discussed byte code in Chapter 2.)
Python checks file timestamps and skips the source to byte code compile step, if it finds a .pyc byte code file that is not older than the corresponding .py source file. In addition, if Python finds only a byte code file on the search path and no source, it simply loads the byte code directly. Because of this, the compile step is bypassed if possible, to speed program startup. If you change the source code, Python will automatically regenerate the byte code the next time your program is run. Moreover, you can ship a program as just byte code files, and avoid sending source.
Notice that compilation happens when a file is being imported. Because of this, you will not usually see a .pyc byte code file for the top-level file of your program, unless it is also imported elsewhere—only imported files leave behind a .pyc on your machine. The byte code of top-level files is used internally and discarded; byte-code of imported files is saved in files to speed future imports.
Top-level files are often designed to be executed directly and not
imported at all. Later, we’ll see that it is
possible to design a file that serves both as the top-level code of a
program, and a module of tools to be imported. Such files may be both
executed and imported, and thus generate a .pyc.
To learn how, watch for the discussion of the special __name__
attribute and "__main__
" in
Chapter 18.
The final step of an import operation executes the byte code of the
module. All statements in the file execute in turn, from top to
bottom, and any assignments made to names during this step generate
attributes of the resulting module object. This execution step
generates all the tools that the module’s code
defines. For instance, def
statements in a file
are run at import time to create functions, and assign attributes
within the module to those functions.
The
functions are called later in the program by importers.
Because this last import step actually runs the
file’s code, if any top-level code in a module file
does real work, you’ll see its results at import
time. For example, top-level print
statements in a
module show output when the file is imported. Function
def
statements simply define objects for later
use.
As you can see, import operations involve quite a bit of work—they search for files, possibly run a compiler, and run Python code. A given module is only imported once per process by default. Future imports skip all three import steps, and reuse the already-loaded module in memory.[3]
As you can also see, the import operation is at the heart of program architecture in Python. Larger programs are divided into multiple files, which are linked together at runtime by imports. Imports in turn use module search paths to locate your files, and modules define attributes for external use.
Of course, the whole point of imports and modules is to provide a
structure to your program, which divides its logic into
self-contained software components. Code in one module is isolated
from code in another; in fact, no file can ever see the names defined
in another, unless explicit import
statements are
run. To see what this all means in terms of actual code,
let’s move on
to Chapter 16.
[1] In fact, it’s syntactically illegal to include path and suffix detail in an import. In Chapter 17, we’ll meet package imports, which allow import statements to include part of the directory path leading to a file, as a set of period-separated names. However, package imports still rely on the normal module search path, to locate the leftmost directory in a package path. They also cannot make use of any platform-specific directory syntax in the import statement; such syntax only works on the search path. Also note that module file search path issues are not as relevant when you run frozen executables (discussed in Chapter 2); they typically embed byte code in the binary image.
[2] Some programs really need to change
sys.path
, though. Scripts that run on web servers,
for example, usually run as user
“nobody” to limit machine access.
Because such scripts cannot usually depend on
“nobody” to have set
PYTHONPATH
in any particular way, they often set
sys.path
manually to include required source
directories, prior to running any import statements.
[3] Technically, Python
keeps already-loaded modules in the built-in
sys.modules
dictionary, and checks that at the
start of an import operation to know if the module is already loaded.
If you want to see which modules are loaded, import
sys
, and print sys.modules.keys(
)
. More on this internal table in Chapter 18.