In this section, we introduce a few module-related ideas that seem important enough to stand on their own (or obscure enough to defy our organizational skills).
As currently implemented, the Python system is often called an interpreter, but it’s really somewhere between a classic interpreter and compiler. As in Java, Python programs are compiled to an intermediate form called bytecode, which is then executed on something called a virtual machine. Since the Python virtual machine interprets the bytecode form, we can get away with saying that Python is interpreted, but it still goes through a compile phase first.
Luckily, the compile step is completely automated and hidden in Python. Python programmers simply import modules and use the names they define; Python takes care to automatically compile modules to bytecode when they are first imported. Moreover, Python tries to save a module’s bytecode in a file, so it can avoid recompiling in the future if the source code hasn’t been changed. In effect, Python comes with an automatic make system to manage recompiles.[38]
Here’s how this works. You may have noticed
.pyc
files in your module directories after
running programs; these are the files Python generates to save a
module’s bytecode (provided you have write access to source
directories). When a module M
is imported,
Python loads a M.pyc
bytecode file instead of the
corresponding M.py
source file, as long as the
M.py
file hasn’t been changed since the
M.pyc
bytecode was saved. If you change the
source code file (or delete the .pyc), Python is
smart enough to recompile the module when imported; if not, the saved
bytecode files make your program start quicker by avoiding recompiles
at runtime.
As we’ve seen, Python modules export all names assigned at the top level of their file. There is no notion of declaring which names should and shouldn’t be visible outside the module. In fact, there’s no way to prevent a client from changing names inside a module if they want to.
In Python, data hiding in modules is a convention, not a syntactical constraint. If you want to break a module by trashing its names, you can (though we have yet to meet a programmer who would want to). Some purists object to this liberal attitude towards data hiding and claim that it means Python can’t implement encapsulation. We disagree (and doubt we could convince purists of anything in any event). Encapsulation in Python is more about packaging, than restricting.[39]
As a special case, prefixing names with an underscore (e.g.,
_X
) prevents them from being copied out when a
client imports with a from*
statement. This really
is intended only to minimize namespace pollution;since
from*
copies out all names, you may get more than
you bargained for (including names which overwrite names in the
importer). But underscores aren’t “private”
declarations: you can still see and change such names with other
import
forms.
Here’s a special module-related trick that lets you both import
a module from clients and run it as a standalone program. Each module
has a built-in attribute called __name
__, which
Python sets as follows:
If the file is being run as a program, __ name
_
_ is set to the string __ main
__ when it starts
If the file is being imported, __name
__ is set
to the module’s name as known by its clients
The upshot is that a module can test its own _
_name
__ to determine whether it’s being run
or imported. For example, suppose we create the module file below, to
export a single function called tester
:
def tester(): print "It's Christmas in Heaven..." if __name__ == '__main__': # only when run tester() # not when imported
This module defines a function for clients to import and use as usual:
%python
>>>import runme
>>>runme.tester()
It's Christmas in Heaven...
But the module also includes code at the bottom that is set up to call the function when this file is run as a program:
% python runme.py
It's Christmas in Heaven...
Perhaps the most common place you’ll see the _
_main
__ test applied is for self-test code: you can package code that tests a module’s
exports in the module itself by wrapping it in a __ main
__ test at the bottom. This way, you can use the file in
clients and test its logic by running it from the system shell.
We’ve mentioned that
the module search path is a list of
directories in environment variable PYTHONPATH
.
What we haven’t told you is that a Python program can actually
change the search path, by assigning to a built-in list called
sys.path
(the path
attribute in
the built-in sys
module).
sys.path
is initialized from
PYTHONPATH
(plus compiled-in defaults) on startup,
but thereafter, you can delete, append, and reset its components
however you like:
>>>import sys
>>>sys.path
['.', 'c:\python\lib', 'c:\python\lib\tkinter'] >>>sys.path = ['.']
# change module search path >>>sys.path.append('c:\book\examples')
# escape backlashes as "\" >>>sys.path
['.', 'c:\book\examples'] >>>import string
Traceback (innermost last): File "<stdin>", line 1, in ? ImportError: No module named string
You can use this to dynamically configure a search path inside a
Python program. Be careful, though; if you delete a critical
directory from the path, you may lose access to critical utilities.
In the last command above, for example, we no longer have access to
the string
module, since we deleted the Python
source library’s directory from the path.
Packages are an advanced tool, and we debated whether to cover them in this book. But since you may run across them in other people’s code, here’s a quick overview of their machinery.
In short, Python packages allow you to import modules using directory
paths; qualified names in import
statements
reflect the directory structure on your machine. For instance, if
some module C
lives in a directory
B
, which is in turn a subdirectory of directory
A
, you can say import A.B.C
to
load the module. Only directory A
needs to be
found in a directory listed in the PYTHONPATH
variable, since the path from A
to
C
is given by qualification.
Packages come in handy when integrating systems written by
independent developers; by storing each system’s set of modules
in its own subdirectory, we can reduce the risk of name clashes. For
instance, if each developer writes a module called
spam.py
, there’s no telling which will be
found on PYTHONPATH
first if package qualifier
paths aren’t used. If another subsystem’s directory
appears on PYTHONPATH
first, a subsystem may see
the wrong one.
Again, if you’re new to Python, make sure that you’ve
mastered simple modules before stepping up to packages. Packages are
more complex than we’ve described here; for instance, each
directory used as a package must include a __ init
__.py
module to identify itself as
such. See Python’s reference manuals for the whole story.
Like functions, modules present design tradeoffs: deciding which functions go in which module, module communication mechanisms, and so on. Here too, it’s a bigger topic than this book allows, so we’ll just touch on a few general ideas that will become clearer when you start writing bigger Python systems:
There’s no way to write code that doesn’t live in some
module. In fact, code typed at the interactive prompt really goes in
a built-in module called __main
__.
Like functions, modules work best if they’re written to be closed boxes. As a rule of thumb, they should be as independent of global names in other modules as possible.
You can minimize a module’s couplings by maximizing its cohesion; if all the components of a module share its general purpose, you’re less likely to depend on external names.
It’s perfectly okay to use globals defined in another module (that’s how clients import services, after all), but changing globals in another module is usually a symptom of a design problem. There are exceptions of course, but you should try to communicate results through devices such as function return values, not cross-module changes.
Finally, because modules expose most of their interesting properties as built-in attributes, it’s easy to write programs that manage other programs. We usually call such manager programs metaprograms , because they work on top of other systems. This is also referred to as introspection, because programs can see and process object internals.
For instance, to get to an attribute called name
in a module called M
, we can either use
qualification, or index the module’s attribute dictionary
exposed in the built-in _
_dict
__ attribute. Further, Python also exports
the list of all loaded modules as the
sys.modules
dictionary (that is, the
modules
attribute of the sys
module), and provides a built-in called getattr
that lets us fetch attributes from their string names. Because of
that, all the following expressions reach the same attribute and
object:
M.name # qualify object M.__dict__['name'] # index namespace dictionary manually sys.modules['M'].name # index loaded-modules table manually getattr(M, 'name') # call built-in fetch function
By exposing module internals like this, Python helps you build
programs about programs.[40] For example,
here is a module that puts these ideas to work, to implement a
customized version of the built-in dir
function.
It defines and exports a function called listing
,
which takes a module object as an argument and prints a formatted
listing of the module’s namespace:
# a module that lists the namespaces of other modules verbose = 1 def listing(module): if verbose: print "-"*30 print "name:", module.__ _name__, "file:", module.__file__ print "-"*30 count = 0 for attr in module.__dict__.keys(): # scan namespace print "%02d) %s" % (count, attr), if attr[0:2] == "__": print "<built-in name>" # skip __file__, etc. else: print getattr(module, attr) # same as .__dict__[attr] count = count+1 if verbose: print "-"*30 print module.__name__, "has %d names" % count print "-"*30 if __name__ == "__main__": import mydir listing(mydir) # self-test code: list myself
We’ve also provided self-test logic at the bottom of this module, which narcissistically imports and lists itself. Here’s the sort of output produced:
C:python> python mydir.py
------------------------------
name: mydir file: mydir.py
------------------------------
00) __file__ <built-in name>
01) __name__ <built-in name>
02) listing <function listing at 885450>
03) __doc__ <built-in name>
04) __builtins__ <built-in name>
05) verbose 1
------------------------------
mydir has 6 names
------------------------------
We’ll meet getattr
and its relatives again.
The point to notice here is that mydir
is a
program that lets you browse other programs. Because Python exposes
its internals, you can process objects generically.[41]
[38] For readers who have never used C or C++, a make system is a way to automate compiling and linking programs. make systems typically use file modification dates to know when a file must be recompiled (just like Python).
[39] Purists would probably also be horrified by the rogue C++ programmer who types #define private public
to break C++’s hiding mechanism in a single blow. But then those are rogue programmers for you.
[40] Notice that because a
function can access its enclosing module by going through the
sys.modules
table like this, it’s possible
to emulate the effect of the global
statement we
met in Chapter 4. For instance, the effect of
global
X;
X=0
can be simulated by saying, inside a function:
import sys; glob=sys.modules[
_
_name
__ ];
glob.X=0
(albeit with much more typing). Remember,
each module gets a __ name
__ attribute for free;
it’s visible as a global name inside functions within a module.
This trick provides a way to change both local and global variables
of the same name, inside a function.
[41] By the way,
tools such as mydir.listing
can be preloaded into
the interactive namespace, by importing them in the file referenced
by the PYTHONSTARTUP
environment variable. Since
code in the startup file runs in the interactive namespace (module _
_main
__), imports of common tools in the startup
file can save you some typing. See Chapter 1 for
more details.