Part V concludes with a collection of more
advanced module-related topics, along with the standard set of
gotchas and exercises. Just like funtions, modules are more effective
when their interfaces are defined well, so this chapter also takes a
brief look at module design concepts. Some of the topics here, such
as the __name__
trick, are very widely used,
despite the word “advanced” in this
chapter’s title.
As we’ve seen, Python modules export all names assigned at the top level of their file. There is no notion of declaring which names should and shouldn’t be visible outside the module. In fact, there’s no way to prevent a client from changing names inside a module if they want to.
In Python, data hiding in modules is a convention, not a syntactical constraint. If you want to break a module by trashing its names, you can, but we have yet to meet a programmer who would want to. Some purists object to this liberal attitude towards data hiding and claim that it means Python can’t implement encapsulation. However, encapsulation in Python is more about packaging than about restricting.
As a special case, prefixing names with a single underscore (e.g.,
_X
) prevents them from being copied out when a
client imports with a from*
statement. This really
is intended only to minimize namespace pollution; since
from*
copies out all names, you may get more than
you bargained for (including names that overwrite names in the
importer). But underscores aren’t
“private” declarations: you can
still see and change such names with other import forms such as the
import
statement.
A module can achieve a hiding effect similar to the
_X
naming convention, by assigning a list of
variable name strings to the variable __all__
at
the top level of the module. For example:
__all__ = ["Error", "encode", "decode"] # Export these only.
When this feature is used, the from*
statement
will only copy out those names listed in the __all__
list. In effect, this is the converse of the
_X
convention: __all__
contains names to be copied, but _X
identifies
names to not be copied. Python looks for an __all__
list in the module first; if one is not defined,
from*
copies all names without a single leading
underscore.
The __all__
list also only has meaning to the
from*
statement form, and is not a privacy
declaration. Module writers can use either trick, to implement
modules that are well-behaved when used with
from*
. See the discussion of __all__
lists in package __init__.py files
in Chapter 17; there, they declare submodules to
be loaded for a from*
.
Changes to the language that may potentially break existing code in the future are introduced gradually. Initially, they appear as optional extensions, which are disabled by default. To turn on such extensions, use a special import statement of this form:
from __future__ import featurename
This statement should generally appear at the top of a module file (possibly after a docstring), because it enables special compilation of code on a per-module basis. It’s also possible to submit this statement at the interactive prompt to experiment with upcoming language changes; the feature will then be available for the rest of the interactive session.
For example, we had to use this in Chapter 14 to
demonstrate generator functions, which require a keyword that is not
yet enabled by default (they use a featurename of
generators
). We also used this statement to
activate true division for numbers in Chapter 4.
Here’s a special
module-related trick that lets you both
import a file as a module, and run it as a standalone program. Each
module has a built-in attribute called __name__
,
which Python sets automatically as follows:
If the file is being run as a top-level program file, __name__
is set to the string "__main__
" when it starts.
If the file is being imported, __name__
is
instead set to the module’s name as known by its
clients.
The upshot is that a module can test its own __name__
to determine whether it’s being run or
imported. For example, suppose we create the following module file,
named runme.py, to export a single function
called tester
:
def tester( ): print "It's Christmas in Heaven..." if __name__ == '__main__': # Only when run tester( ) # Not when imported
This module defines a function for clients to import and use as usual:
%python
>>>import runme
>>>runme.tester( )
It's Christmas in Heaven...
But the module also includes code at the bottom that is set up to call the function when this file is run as a program:
% python runme.py
It's Christmas in Heaven...
Perhaps the most common place you’ll see the
__name__
test applied is for
self-test code: you can package code that tests
a module’s exports in the module itself, by wrapping
it in a __name__
test at the bottom. This way,
you can use the file in clients by importing it, and test its logic
by running it from the system shell or other launching schemes. Chapter 26 will discuss other commonly used options for
testing Python code.
Another common role for the __name__
trick, is
for writing files whose functionalty can be used as
both a command-line utility, and a tool library.
For instance, suppose you write a file finder script in Python; you
can get more mileage out of your code, if you package your code in
functions, and add a __name__
test in the file
to automatically call those functions when the file is
run standalone.
That way, the script’s code becomes reusable in
other programs.
In Chapter 15, we mentioned that
the
module search path is a list of directories initialized from
environment variable PYTHONPATH
, and possibly
.pth path files. What we
haven’t shown you until now is how a Python program
can actually change the search path, by changing a built-in list
called sys.path
(the path
attribute in the built-in sys
module).
sys.path
is initialized on startup, but thereafter, you can delete, append,
and reset its components however you like:
>>>import sys
>>>sys.path
['', 'D:\PP2ECD-Partial\Examples', 'C:\Python22', ...more deleted...] >>>sys.path = [r'd: emp'] # Change module search path
>>>sys.path.append('c:\lp2e\examples') # for this process only.
>>>sys.path
['d:\temp', 'c:\lp2e\examples'] >>>import string
Traceback (most recent call last): File "<stdin>", line 1, in ? ImportError: No module named string
You can use this to dynamically configure a search path inside a
Python program. Be careful: if you delete a critical directory from
the path, you may lose access to critical utilities. In the last
command in the example, we no longer have access to the
string
module, since we deleted the Python source
library’s directory from the path. Also remember
that such settings only endure for the Python session or program that
made them; they are not retained after Python exits.
Both the import
and from
statements have been extended to allow a
module to be given a different name in your script:
import longmodulename as name
is equivalent to:
import longmodulename name = longmodulename del longmodulename # Don't keep original name.
After the import
, you can (and in fact must) use
the name after the as
to refer to the module. This
works in a from
statement too:
from module import longname as name
to assign the name from the file to a different name in your script. This extension is commonly used to provide short synonyms for longer names, and to avoid name clashes when you are already using a name in your script that would otherwise be overwritten by a normal import statement. This also comes in handy for providing a short, simple name for an entire directory path, when using the package import feature described in Chapter 17.
Like functions, modules present design tradeoffs: deciding which functions go in which module, module communication mechanisms, and so on. Here are a few general ideas that will become clearer when you start writing bigger Python systems:
You’re always in a module in Python. There’s no way to write code that
doesn’t live in some module. In fact, code typed at
the interactive prompt really goes in a built-in module called
__main__
; the only unique things about the
interactive prompt is that code runs and is disgarded immediately,
and that expression results are printed.
Minimize module coupling: global variables. Like functions, modules work best if they’re written to be closed boxes. As a rule of thumb, they should be as independent of global names in other modules as possible.
Maximize module cohesion: unified purpose. You can minimize a module’s couplings by maximizing its cohesion; if all the components of a module share its general purpose, you’re less likely to depend on external names.
Modules should rarely change other modules’ variables. It’s perfectly okay to use globals defined in another module (that’s how clients import services), but changing globals in another module is often a symptom of a design problem. There are exceptions of course, but you should try to communicate results through devices such as function return values, not cross-module changes. Otherwise your globals’ values become dependent on the order of arbitrarily remote assignments.
As a summary, Figure 18-1 sketches the environment in which modules operate. Modules contain variables, functions, classes, and other modules (if imported). Functions have local variables of their own. You’ll meet classes—another object that lives within modules—in Chapter 19.
Because modules expose most of their interesting properties as built-in attributes, it’s easy to write programs that manage other programs. We usually call such manager programs metaprograms, because they work on top of other systems. This is also referred to as introspection , because programs can see and process object internals. Introspection is an advanced feature, but can be useful for building programming tools.
For instance, to get to an attribute called name
in a module called M
, we can either use
qualification, or index the module’s attribute
dictionary exposed in the built-in _
_dict__
attribute. Further, Python also exports
the list of all loaded modules as the
sys.modules
dictionary (that is, the modules
attribute of the
sys
module), and provides a built-in called
getattr
that lets us fetch attributes from their
string names (it’s like saying
object.attr
, but attr
is a
runtime string). Because of that, all the following expressions reach
the same attribute and object:
M.name # Qualify object. M.__dict__['name'] # Index namespace dictionary manually. sys.modules['M'].name # Index loaded-modules table manually. getattr(M, 'name') # Call built-in fetch function.
By exposing module internals like this, Python helps you build
programs about programs.[1] For example, here is a
module named mydir.py that puts these ideas to
work, to implement a customized version of the built-in
dir
function. It defines and exports a function
called listing
, which takes a module object as an
argument and prints a formatted listing of the
module’s namespace:
# A module that lists the namespaces of other modules verbose = 1 def listing(module): if verbose: print "-"*30 print "name:", module.__name__, "file:", module.__file__ print "-"*30 count = 0 for attr in module.__dict__.keys( ): # Scan namespace. print "%02d) %s" % (count, attr), if attr[0:2] == "__": print "<built-in name>" # Skip __file__, etc. else: print getattr(module, attr) # Same as .__dict__[attr] count = count+1 if verbose: print "-"*30 print module.__name__, "has %d names" % count print "-"*30 if __name__ == "__main__": import mydir listing(mydir) # Self-test code: list myself
We’ve also provided self-test logic at the bottom of this module, which narcissistically imports and lists itself. Here’s the sort of output produced:
C:python> python mydir.py
------------------------------
name: mydir file: mydir.py
------------------------------
00) __file__ <built-in name>
01) __name__ <built-in name>
02) listing <function listing at 885450>
03) __doc__ <built-in name>
04) __builtins__ <built-in name>
05) verbose 1
------------------------------
mydir has 6 names
------------------------------
We’ll meet getattr
and its
relatives
again. The point to notice here is that
mydir
is a program that lets you browse other
programs. Because Python exposes its internals, you can process
objects
generically.[2]
Here is the usual collection of boundary cases, which make life interesting for beginners. Some are so obscure it was hard to come up with examples, but most illustrate something important about Python.
The module name in an
import
or
from
statement is a hardcoded variable name.
Sometimes, though, your program will get the name of a module to be
imported as a string at runtime (e.g., if a user selects a module
name from within a GUI). Unfortunately, you can’t
use import statements directly to load a module given its name as a
string—Python expects a variable here, not a string. For
instance:
>>> import "string"
File "<stdin>", line 1
import "string"
^
SyntaxError: invalid syntax
It also won’t work to put the string in a variable name:
x = "string" import x
Here, Python will try to import a file x.py, not
the string
module.
To get around this, you need to use special tools to load modules
dynamically from a string that exists at runtime. The most general
approach is to construct an import
statement as a
string of Python code and pass it to the exec
statement to run:
>>>modname = "string"
>>>exec "import " + modname # Run a string of code.
>>>string # Imported in this namespace
<module 'string'>
The exec
statement (and its cousin for
expressions, the eval
function) compiles a string
of code, and passes it to the Python interpreter to be executed. In
Python, the byte code compiler is available at runtime, so you can
write programs that construct and run other programs like this. By
default, exec
runs the code in the current scope,
but you can get more specific by passing in optional namespace
dictionaries.
The only real drawback to exec
is that it must
compile the import
statement each time it runs; if
it runs many times, your code may run quicker if it uses the built-in
__import__
function to load from a name string
instead. The effect is similar, but __import__
returns the module object, so assign it to a name here to keep it:
>>>modname = "string"
>>>string = __import__(modname)
>>>string
<module 'string'>
The
from
statement is really an assignment to names in the
importer’s scope—a name-copy operation, not a
name aliasing. The implications of this are the same as for all
assignments in Python, but subtle, especially given that the code
that shares objects lives in different files. For instance, suppose
we define the following module (nested1.py):
X = 99 def printer( ): print X
If we import its two names using from
in another
module (nested2.py), we get copies of those
names, not links to them. Changing a name in the importer resets only
the binding of the local version of that name, not the name in
nested1.py:
from nested1 import X, printer # Copy names out.
X = 88 # Changes my "X" only!
printer( ) # nested1's X is still 99
% python nested2.py
99
If you use import
to get the whole module and
assign to a qualified name, you change the name in
nested1.py. Qualification directs Python to a
name in the module object, rather than a name in the importer
(nested3.py):
import nested1 # Get module as a whole.
nested1.X = 88 # Okay: change nested1's X
nested1.printer( )
% python nested3.py
88
When a module is first imported (or reloaded), Python executes its statements one by one, from the top of file to the bottom. This has a few subtle implications regarding forward references that are worth underscoring here:
Code at the top level of a module file (not nested in a function) runs as soon as Python reaches it during an import; because of that, it can’t reference names assigned lower in the file.
Code inside a function body doesn’t run until the function is called; because names in a function aren’t resolved until the function actually runs, they can usually reference names anywhere in the file.
Generally, forward references are only a concern in top-level module code that executes immediately; functions can reference names arbitrarily. Here’s an example that illustrates forward reference:
func1( ) # Error: "func1" not yet assigned def func1( ): print func2( ) # Okay: "func2" looked up later func1( ) # Error: "func2" not yet assigned def func2( ): return "Hello" func1( ) # Okay: "func1" and "func2" assigned
When this file is imported (or run as a standalone program), Python
executes its statements from top to bottom. The first call to
func1
fails because the func1
def
hasn’t run yet. The call to
func2
inside func1
works as
long as func2
’s
def
has been reached by the time
func1
is called (it hasn’t when
the second top-level func1
call is run). The last
call to func1
at the bottom of the file works,
because func1
and func2
have
both been assigned.
Mixing defs
with top-level code is not only hard
to read, it’s dependent on statement ordering. As a
rule of thumb, if you need to mix immediate code with
defs
, put your defs at the top of the file and
top-level code at the bottom. That way, your functions are defined
and assigned by the time code that uses them runs.
Because imports execute a file’s statements from top
to bottom, you sometimes need to be careful when using modules that
import each other (something called
recursive
imports). Since the statements in a module have
not all been run when it imports another module, some of its names
may not yet exist. If you use import
to fetch a
module as a whole, this may or may not matter; the
module’s names won’t be accessed
until you later use qualification to fetch their values. But if you
use from
to fetch specific names, you only have
access to names already assigned.
For instance, take the following modules recur1
and recur2
. recur1
assigns a
name X
, and then imports
recur2
, before assigning name
Y
. At this point, recur2
can
fetch recur1
as a whole with an
import
(it already exists in
Python’s internal modules table), but it can see
only name X
if it uses from
;
the name Y
below the import
in
recur1
doesn’t yet exist, so you
get an error:
#File: recur1.py
X = 1
import recur2 # Run recur2 now if it doesn't exist.
Y = 2
#File: recur2.py
from recur1 import X # Okay: "X" already assigned
from recur1 import Y # Error: "Y" not yet assigned
>>> import recur1
Traceback (innermost last):
File "<stdin>", line 1, in ?
File "recur1.py", line 2, in ?
import recur2
File "recur2.py", line 2, in ?
from recur1 import Y # Error: "Y" not yet assigned
ImportError: cannot import name Y
Python avoids rerunning recur1
’s
statements when they are imported recursively from
recur2
(or else the imports would send the script
into an infinite loop), but
recur1
’s namespace is incomplete
when imported by recur2
.
Don’t use from
in recursive
imports . . . really! Python won’t get stuck in a
cycle, but your programs will once again be dependent on the order of
statements in modules. There are two ways out of this gotcha:
You can usually eliminate import cycles like this by careful design; maximizing cohesion and minimizing coupling are good first steps.
If you can’t break the cycles completely, postpone
module name access by using import
and
qualification (instead of from
), or running your
froms
inside functions (instead of at the top
level of the module) or near the bottom of your file to defer their
execution.
The from
statement is
the source of all sorts of gotchas in
Python. Here’s another: because
from
copies (assigns) names when run,
there’s no link back to the module where the names
came from. Names imported with from
simply become
references to objects, which happen to have been referenced by the
same names in the importee when the from
ran.
Because of this behavior, reloading the importee has no effect on
clients that use from
; the
client’s names still reference the original objects
fetched with from
, even though names in the
original module have been reset:
from module import X # X may not reflect any module reloads! . . . reload(module) # Changes module, but not my names X # Still references old object
Don’t do it that way. To make reloads more
effective, use import
and name qualification,
instead of from
. Because qualifications always go
back to the module, they will find the new bindings of module names
after reloading:
import module # Get module, not names. . . . reload(module) # Changes module in-place. module.X # Get current X: reflects module reloads
Chapter 3 warned readers that
it’s
usually better to not launch programs
with imports and reloads, because of the complexities involved.
Things get even worse with from
. Python beginners
often encounter this gotcha: after opening a module file in a text
edit window, they launch an interactive session to load and test
their module with from
:
from module import function function(1, 2, 3)
After finding a bug, they jump back to the edit window, make a change, and try to reload this way:
reload(module)
Except this doesn’t work—the
from
statement assigned the name
function
, not module
. To refer
to the module in a reload
, you have to first load
it with an import
statement, at least once:
import module reload(module) function(1, 2, 3)
Except this doesn’t quite work
either—reload
updates the module object, but
names like function
copied out of the module in
the past still refer to old objects (in this case, the original
version of the function). To really get the new function, either call
it module.function
after the
reload
, or rerun the from
:
import module reload(module) from module import function function(1, 2, 3)
And now, the new version of the function finally runs. But there are
problems inherent in using reload
with
from
; not only do you have to remember to reload
after imports, you also have to remember to rerun your
from
statements after reloads; this is complex
enough to even trip up an expert once in a while.
You should not expect reload
and
from
to play together nicely. Better yet,
don’t combine them at all—use
reload
with import
, or launch
programs other ways, as suggested in Chapter 3
(e.g., use the Edit/Runscript option in IDLE, file icon clicks, or
system command lines).
When you reload a module, Python only reloads that particular
module’s file; it doesn’t
automatically reload modules that the file being reloaded happens to
import. For example, if you reload some module A
,
and A
imports modules B
and
C
, the reload only applies to
A
, not B
and
C
. The statements inside A
that
import B
and C
are rerun during
the reload, but they’ll just fetch the already
loaded B
and C
module objects
(assuming they’ve been imported before). In actual
code, here’s file A.py:
import B # Not reloaded when A is import C # Just an import of an already loaded module %python
>>> . . . >>>reload(A)
Don’t depend on transitive module reloads. Use
multiple reload
calls to update subcomponents
independently. If desired, you can design your systems to reload
their subcomponents automatically by adding reload
calls in parent modules like A
.
Better still, you could write a general tool to do transitive reloads
automatically, by scanning module __dict__
s (see
Section 18.6.1 earlier in this chapter),
and checking each item’s type( )
(see Chapter 7) to find nested modules to reload
recursively. Such a utility function could call itself, recursively,
to navigate arbitrarily shaped import dependency chains.
Module reloadall.py, listed below, has a
reload_all
function that automatically reloads a
module, every module that the module imports, and so on, all the way
to the bottom of the import chains. It uses a dictionary to keep
track of modules already reloaded, recursion to walk the import
chains, and the standard library’s
types
module (introduced at the end of Chapter 7), which simply predefines type(
)
result for built-in types.
To use this utility, import its reload_all
function, and pass it the name of an already-loaded module, much like
the built-in reload
function; when the file runs
stand-alone, its self-test code tests itself—it has to import
itself, because its own name is not defined in the file without an
import. We encourage you to study and experiment with this example on
your own:
import types def status(module): print 'reloading', module.__name__ def transitive_reload(module, visited): if not visited.has_key(module): # Trap cycles, dups. status(module) # Reload this module reload(module) # and visit children. visited[module] = None for attrobj in module.__dict__.values( ): # For all attrs if type(attrobj) == types.ModuleType: # Recur if module transitive_reload(attrobj, visited) def reload_all(*args): visited = { } for arg in args: if type(arg) == types.ModuleType: transitive_reload(arg, visited) if __name__ == '__main__': import reloadall # Test code: reload myself reload_all(reloadall) # Should reload this, types
See Section B.5 for the solutions.
Basics, import. Write a program that counts lines and characters in a file (similar in spirit to “wc” on Unix). With your text editor, code a Python module called mymod.py, which exports three top-level names:
A countLines(name)
function that reads an input
file and counts the number of lines in it (hint:
file.readlines( )
does most of the work for you,
and len
does the rest)
A countChars(name)
function that reads an input
file and counts the number of characters in it (hint:
file.read( )
returns a single string)
A test(name)
function that calls both counting
functions with a given input filename. Such a filename generally
might be passed-in, hardcoded, input with
raw_input
, or pulled from a command line via the
sys.argv
list; for now, assume
it’s a passed-in function argument.
All three mymod
functions should expect a filename
string to be passed in. If you type more than two or three lines per
function, you’re working much too hard—use the
hints listed above!
Next, test your module interactively, using import and name
qualification to fetch your exports. Does your
PYTHONPATH
need to include the directory where you
created mymod.py
? Try running your module on
itself: e.g., test("mymod.py")
. Note that
test
opens the file twice; if
you’re feeling ambitious, you may be able to improve
this by passing an open file object into the two count functions
(hint: file.seek(0)
is a file rewind).
from/from*. Test your mymod
module from Exercise 1 interactively, by using
from
to load the exports directly, first by name,
then using the from*
variant to fetch everything.
__main__
. Add a line in your
mymod
module that calls the
test
function automatically only when the module
is run as a script, not when it is imported. The line you add will
probably test the value of __name__
for the
string "__main__
“, as shown in this chapter. Try
running your module from the system command line; then, import the
module and test its functions interactively. Does it still work in
both modes?
Nested imports. Write a second module,
myclient.py
, which imports
mymod
and tests its functions; run
myclient
from the system command line. If
myclient
uses from
to fetch
from mymod
, will
mymod
’s functions be accessible
from the top level of myclient
? What if it imports
with import
instead? Try coding both variations in
myclient
and test interactively, by importing
myclient
and inspecting its __dict__
.
Package imports. Import your file from a
package. Create a subdirectory called mypkg
nested
in a directory on your module import search path, move the
mymod.py
module file you created in Exercise 1 or
3 into the new directory, and try to import it with a package import
of the form: import mypkg.mymod
.
You’ll need to add an __init__.py file in the directory your module was moved to in order to make this go, but it should work on all major Python platforms (that’s part of the reason Python uses “.” as a path separator). The package directory you create can be simply a subdirectory of the one you’re working in; if it is, it will be found via the home directory component of the search path, and you won’t have to configure your path. Add some code to your __init__.py, and see if it runs on each import.
Reload. Experiment with module reloads: perform
the tests in Chapter 16s
changer.py
example, changing the called
function’s message and/or behavior repeatedly,
without stopping the Python interpreter. Depending on your system,
you might be able to edit changer
in another
window, or suspend the Python interpreter and edit in the same window
(on Unix, a Ctrl-Z key combination usually suspends the current
process, and a fg
command later resumes it).
Circular imports.[3] In the
section on recursive import gotchas, importing
recur1
raised an error. But if you restart Python
and import recur2
interactively, the error
doesn’t occur: test and see this for yourself. Why
do you think it works to import recur2
, but not
recur1
? (Hint: Python stores new modules in the
built-in sys.modules
table (a dictionary) before
running their code; later imports fetch the module from this table
first, whether the module is
“complete” yet or not.) Now try
running recur1
as a top-level script file:
%
python
recur1.py
. Do you get the same error that occurs
when recur1
is imported interactively? Why? (Hint:
when modules are run as programs they aren’t
imported, so this case has the same effect as importing
recur2
interactively; recur2
is
the first module imported.) What happens when you run
recur2
as a script?
[1] Notice that because a
function can access its enclosing module by going through the
sys.modules
table like this, it’s
possible to emulate the effect of the global
statement you met in Chapter 13. For instance, the
effect of global X; X=0
can be simulated by
saying, inside a function: import sys; glob=sys.modules[__name__];
glob.X=0
(albeit with much
more typing). Remember, each module gets a __name__
attribute for free; it’s visible as a
global name inside functions within a module. This trick provides
another way to change both local and global variables of the same
name, inside a function.
[2] Tools such as mydir.listing
can be
preloaded into the interactive namespace, by importing them in the
file referenced by the PYTHONSTARTUP
environment
variable. Since code in the startup file runs in the interactive
namespace (module __main__
), imports of common
tools in the startup file can save you some typing. See Appendix A for more details.
[3] Note that circular imports are extremely rare in practice. In fact, this author has never coded or come across a circular import in a decade of Python coding. On the other hand, if you can understand why it’s a potential problem, you know a lot about Python’s import semantics.