Chapter 18. Advanced Module Topics

Part V concludes with a collection of more advanced module-related topics, along with the standard set of gotchas and exercises. Just like funtions, modules are more effective when their interfaces are defined well, so this chapter also takes a brief look at module design concepts. Some of the topics here, such as the __name__ trick, are very widely used, despite the word “advanced” in this chapter’s title.

Data Hiding in Modules

As we’ve seen, Python modules export all names assigned at the top level of their file. There is no notion of declaring which names should and shouldn’t be visible outside the module. In fact, there’s no way to prevent a client from changing names inside a module if they want to.

In Python, data hiding in modules is a convention, not a syntactical constraint. If you want to break a module by trashing its names, you can, but we have yet to meet a programmer who would want to. Some purists object to this liberal attitude towards data hiding and claim that it means Python can’t implement encapsulation. However, encapsulation in Python is more about packaging than about restricting.

Minimizing from* damage: _X and __all__

As a special case, prefixing names with a single underscore (e.g., _X) prevents them from being copied out when a client imports with a from* statement. This really is intended only to minimize namespace pollution; since from* copies out all names, you may get more than you bargained for (including names that overwrite names in the importer). But underscores aren’t “private” declarations: you can still see and change such names with other import forms such as the import statement.

A module can achieve a hiding effect similar to the _X naming convention, by assigning a list of variable name strings to the variable __all__ at the top level of the module. For example:

__all__ = ["Error", "encode", "decode"]     # Export these only.

When this feature is used, the from* statement will only copy out those names listed in the __all__ list. In effect, this is the converse of the _X convention: __all__ contains names to be copied, but _X identifies names to not be copied. Python looks for an __all__ list in the module first; if one is not defined, from* copies all names without a single leading underscore.

The __all__ list also only has meaning to the from* statement form, and is not a privacy declaration. Module writers can use either trick, to implement modules that are well-behaved when used with from*. See the discussion of __all__ lists in package __init__.py files in Chapter 17; there, they declare submodules to be loaded for a from*.

Enabling Future Language Features

Changes to the language that may potentially break existing code in the future are introduced gradually. Initially, they appear as optional extensions, which are disabled by default. To turn on such extensions, use a special import statement of this form:

from __future__ import featurename

This statement should generally appear at the top of a module file (possibly after a docstring), because it enables special compilation of code on a per-module basis. It’s also possible to submit this statement at the interactive prompt to experiment with upcoming language changes; the feature will then be available for the rest of the interactive session.

For example, we had to use this in Chapter 14 to demonstrate generator functions, which require a keyword that is not yet enabled by default (they use a featurename of generators). We also used this statement to activate true division for numbers in Chapter 4.

Mixed Usage Modes: __name__ and __main__

Here’s a special module-related trick that lets you both import a file as a module, and run it as a standalone program. Each module has a built-in attribute called __name__, which Python sets automatically as follows:

  • If the file is being run as a top-level program file, __name__ is set to the string "__main__" when it starts.

  • If the file is being imported, __name__ is instead set to the module’s name as known by its clients.

The upshot is that a module can test its own __name__ to determine whether it’s being run or imported. For example, suppose we create the following module file, named runme.py, to export a single function called tester:

def tester(  ):
    print "It's Christmas in Heaven..."

if __name__ == '__main__':         # Only when run
    tester(  )                        # Not when imported

This module defines a function for clients to import and use as usual:

% python
>>> import runme
>>> runme.tester(  )
It's Christmas in Heaven...

But the module also includes code at the bottom that is set up to call the function when this file is run as a program:

% python runme.py
It's Christmas in Heaven...

Perhaps the most common place you’ll see the __name__ test applied is for self-test code: you can package code that tests a module’s exports in the module itself, by wrapping it in a __name__ test at the bottom. This way, you can use the file in clients by importing it, and test its logic by running it from the system shell or other launching schemes. Chapter 26 will discuss other commonly used options for testing Python code.

Another common role for the __name__ trick, is for writing files whose functionalty can be used as both a command-line utility, and a tool library. For instance, suppose you write a file finder script in Python; you can get more mileage out of your code, if you package your code in functions, and add a __name__ test in the file to automatically call those functions when the file is run standalone. That way, the script’s code becomes reusable in other programs.

Changing the Module Search Path

In Chapter 15, we mentioned that the module search path is a list of directories initialized from environment variable PYTHONPATH, and possibly .pth path files. What we haven’t shown you until now is how a Python program can actually change the search path, by changing a built-in list called sys.path (the path attribute in the built-in sys module). sys.path is initialized on startup, but thereafter, you can delete, append, and reset its components however you like:

>>> import sys
>>> sys.path
['', 'D:\PP2ECD-Partial\Examples', 'C:\Python22', ...more deleted...]

>>> sys.path = [r'd:	emp']                  # Change module search path
>>> sys.path.append('c:\lp2e\examples')    # for this process only.
>>> sys.path
['d:\temp', 'c:\lp2e\examples']

>>> import string
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
ImportError: No module named string

You can use this to dynamically configure a search path inside a Python program. Be careful: if you delete a critical directory from the path, you may lose access to critical utilities. In the last command in the example, we no longer have access to the string module, since we deleted the Python source library’s directory from the path. Also remember that such settings only endure for the Python session or program that made them; they are not retained after Python exits.

The import as Extension

Both the import and from statements have been extended to allow a module to be given a different name in your script:

import longmodulename as name

is equivalent to:

import longmodulename
name = longmodulename
del longmodulename          # Don't keep original name.

After the import, you can (and in fact must) use the name after the as to refer to the module. This works in a from statement too:

from module import longname as name

to assign the name from the file to a different name in your script. This extension is commonly used to provide short synonyms for longer names, and to avoid name clashes when you are already using a name in your script that would otherwise be overwritten by a normal import statement. This also comes in handy for providing a short, simple name for an entire directory path, when using the package import feature described in Chapter 17.

Module Design Concepts

Like functions, modules present design tradeoffs: deciding which functions go in which module, module communication mechanisms, and so on. Here are a few general ideas that will become clearer when you start writing bigger Python systems:

  • You’re always in a module in Python. There’s no way to write code that doesn’t live in some module. In fact, code typed at the interactive prompt really goes in a built-in module called __main__; the only unique things about the interactive prompt is that code runs and is disgarded immediately, and that expression results are printed.

  • Minimize module coupling: global variables. Like functions, modules work best if they’re written to be closed boxes. As a rule of thumb, they should be as independent of global names in other modules as possible.

  • Maximize module cohesion: unified purpose. You can minimize a module’s couplings by maximizing its cohesion; if all the components of a module share its general purpose, you’re less likely to depend on external names.

  • Modules should rarely change other modules’ variables. It’s perfectly okay to use globals defined in another module (that’s how clients import services), but changing globals in another module is often a symptom of a design problem. There are exceptions of course, but you should try to communicate results through devices such as function return values, not cross-module changes. Otherwise your globals’ values become dependent on the order of arbitrarily remote assignments.

As a summary, Figure 18-1 sketches the environment in which modules operate. Modules contain variables, functions, classes, and other modules (if imported). Functions have local variables of their own. You’ll meet classes—another object that lives within modules—in Chapter 19.

Module environment
Figure 18-1. Module environment

Modules Are Objects: Metaprograms

Because modules expose most of their interesting properties as built-in attributes, it’s easy to write programs that manage other programs. We usually call such manager programs metaprograms, because they work on top of other systems. This is also referred to as introspection , because programs can see and process object internals. Introspection is an advanced feature, but can be useful for building programming tools.

For instance, to get to an attribute called name in a module called M, we can either use qualification, or index the module’s attribute dictionary exposed in the built-in _ _dict__ attribute. Further, Python also exports the list of all loaded modules as the sys.modules dictionary (that is, the modules attribute of the sys module), and provides a built-in called getattr that lets us fetch attributes from their string names (it’s like saying object.attr, but attr is a runtime string). Because of that, all the following expressions reach the same attribute and object:

M.name                          # Qualify object.
M.__dict__['name']               # Index namespace dictionary manually.
sys.modules['M'].name           # Index loaded-modules table manually.
getattr(M, 'name')              # Call built-in fetch function.

By exposing module internals like this, Python helps you build programs about programs.[1] For example, here is a module named mydir.py that puts these ideas to work, to implement a customized version of the built-in dir function. It defines and exports a function called listing, which takes a module object as an argument and prints a formatted listing of the module’s namespace:

# A module that lists the namespaces of other modules

verbose = 1

def listing(module):
    if verbose:
        print "-"*30
        print "name:", module.__name__, "file:", module.__file__
        print "-"*30

    count = 0
    for attr in module.__dict__.keys(  ):      # Scan namespace.
        print "%02d) %s" % (count, attr),
        if attr[0:2] == "__":
            print "<built-in name>"           # Skip __file__, etc.
        else:
            print getattr(module, attr)       # Same as .__dict__[attr]
        count = count+1

    if verbose:
        print "-"*30
        print module.__name__, "has %d names" % count
        print "-"*30

if __name__ == "__main__":
    import mydir
    listing(mydir)      # Self-test code: list myself

We’ve also provided self-test logic at the bottom of this module, which narcissistically imports and lists itself. Here’s the sort of output produced:

C:python> python mydir.py
------------------------------
name: mydir file: mydir.py
------------------------------
00) __file__ <built-in name>
01) __name__ <built-in name>
02) listing <function listing at 885450>
03) __doc__ <built-in name>
04) __builtins__ <built-in name>
05) verbose 1
------------------------------
mydir has 6 names
------------------------------

We’ll meet getattr and its relatives again. The point to notice here is that mydir is a program that lets you browse other programs. Because Python exposes its internals, you can process objects generically.[2]

Module Gotchas

Here is the usual collection of boundary cases, which make life interesting for beginners. Some are so obscure it was hard to come up with examples, but most illustrate something important about Python.

Importing Modules by Name String

The module name in an import or from statement is a hardcoded variable name. Sometimes, though, your program will get the name of a module to be imported as a string at runtime (e.g., if a user selects a module name from within a GUI). Unfortunately, you can’t use import statements directly to load a module given its name as a string—Python expects a variable here, not a string. For instance:

>>> import "string"
  File "<stdin>", line 1
    import "string"
                  ^
SyntaxError: invalid syntax

It also won’t work to put the string in a variable name:

x = "string"
import x

Here, Python will try to import a file x.py, not the string module.

To get around this, you need to use special tools to load modules dynamically from a string that exists at runtime. The most general approach is to construct an import statement as a string of Python code and pass it to the exec statement to run:

>>> modname = "string"
>>> exec "import " + modname       # Run a string of code.
>>> string                         # Imported in this namespace
<module 'string'>

The exec statement (and its cousin for expressions, the eval function) compiles a string of code, and passes it to the Python interpreter to be executed. In Python, the byte code compiler is available at runtime, so you can write programs that construct and run other programs like this. By default, exec runs the code in the current scope, but you can get more specific by passing in optional namespace dictionaries.

The only real drawback to exec is that it must compile the import statement each time it runs; if it runs many times, your code may run quicker if it uses the built-in __import__ function to load from a name string instead. The effect is similar, but __import__ returns the module object, so assign it to a name here to keep it:

>>> modname = "string"
>>> string = __import__(modname)
>>> string
<module 'string'>

from Copies Names but Doesn’t Link

The from statement is really an assignment to names in the importer’s scope—a name-copy operation, not a name aliasing. The implications of this are the same as for all assignments in Python, but subtle, especially given that the code that shares objects lives in different files. For instance, suppose we define the following module (nested1.py):

X = 99
def printer(  ): print X

If we import its two names using from in another module (nested2.py), we get copies of those names, not links to them. Changing a name in the importer resets only the binding of the local version of that name, not the name in nested1.py:

from nested1 import X, printer     # Copy names out.
X = 88                              # Changes my "X" only!
printer(  )                         # nested1's X is still 99

% python nested2.py
99

If you use import to get the whole module and assign to a qualified name, you change the name in nested1.py. Qualification directs Python to a name in the module object, rather than a name in the importer (nested3.py):

import nested1                    # Get module as a whole.
nested1.X = 88                    # Okay: change nested1's X
nested1.printer(  ) 

% python nested3.py
88

Statement Order Matters in Top-Level Code

When a module is first imported (or reloaded), Python executes its statements one by one, from the top of file to the bottom. This has a few subtle implications regarding forward references that are worth underscoring here:

  • Code at the top level of a module file (not nested in a function) runs as soon as Python reaches it during an import; because of that, it can’t reference names assigned lower in the file.

  • Code inside a function body doesn’t run until the function is called; because names in a function aren’t resolved until the function actually runs, they can usually reference names anywhere in the file.

Generally, forward references are only a concern in top-level module code that executes immediately; functions can reference names arbitrarily. Here’s an example that illustrates forward reference:

func1(  )               # Error: "func1" not yet assigned

def func1(  ):
    print func2(  )     # Okay:  "func2" looked up later

func1(  )               # Error: "func2" not yet assigned

def func2(  ):
    return "Hello"

func1(  )               # Okay:  "func1" and "func2" assigned

When this file is imported (or run as a standalone program), Python executes its statements from top to bottom. The first call to func1 fails because the func1 def hasn’t run yet. The call to func2 inside func1 works as long as func2’s def has been reached by the time func1 is called (it hasn’t when the second top-level func1 call is run). The last call to func1 at the bottom of the file works, because func1 and func2 have both been assigned.

Mixing defs with top-level code is not only hard to read, it’s dependent on statement ordering. As a rule of thumb, if you need to mix immediate code with defs, put your defs at the top of the file and top-level code at the bottom. That way, your functions are defined and assigned by the time code that uses them runs.

Recursive “from” Imports May Not Work

Because imports execute a file’s statements from top to bottom, you sometimes need to be careful when using modules that import each other (something called recursive imports). Since the statements in a module have not all been run when it imports another module, some of its names may not yet exist. If you use import to fetch a module as a whole, this may or may not matter; the module’s names won’t be accessed until you later use qualification to fetch their values. But if you use from to fetch specific names, you only have access to names already assigned.

For instance, take the following modules recur1 and recur2. recur1 assigns a name X, and then imports recur2, before assigning name Y. At this point, recur2 can fetch recur1 as a whole with an import (it already exists in Python’s internal modules table), but it can see only name X if it uses from; the name Y below the import in recur1 doesn’t yet exist, so you get an error:

#File: recur1.py
X = 1
import recur2             # Run recur2 now if it doesn't exist.
Y = 2


#File: recur2.py
from recur1 import X      # Okay: "X" already assigned
from recur1 import Y      # Error: "Y" not yet assigned

>>> import recur1
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File "recur1.py", line 2, in ?
    import recur2
  File "recur2.py", line 2, in ?
    from recur1 import Y   # Error: "Y" not yet assigned
ImportError: cannot import name Y

Python avoids rerunning recur1’s statements when they are imported recursively from recur2 (or else the imports would send the script into an infinite loop), but recur1’s namespace is incomplete when imported by recur2.

Don’t use from in recursive imports . . . really! Python won’t get stuck in a cycle, but your programs will once again be dependent on the order of statements in modules. There are two ways out of this gotcha:

  • You can usually eliminate import cycles like this by careful design; maximizing cohesion and minimizing coupling are good first steps.

  • If you can’t break the cycles completely, postpone module name access by using import and qualification (instead of from), or running your froms inside functions (instead of at the top level of the module) or near the bottom of your file to defer their execution.

reload May Not Impact from Imports

The from statement is the source of all sorts of gotchas in Python. Here’s another: because from copies (assigns) names when run, there’s no link back to the module where the names came from. Names imported with from simply become references to objects, which happen to have been referenced by the same names in the importee when the from ran.

Because of this behavior, reloading the importee has no effect on clients that use from; the client’s names still reference the original objects fetched with from, even though names in the original module have been reset:

from module import X       # X may not reflect any module reloads!
 . . . 
reload(module)             # Changes module, but not my names
X                          # Still references old object

Don’t do it that way. To make reloads more effective, use import and name qualification, instead of from. Because qualifications always go back to the module, they will find the new bindings of module names after reloading:

import module              # Get module, not names.
 . . . 
reload(module)             # Changes module in-place.
module.X                   # Get current X: reflects module reloads

reload and from and Interactive Testing

Chapter 3 warned readers that it’s usually better to not launch programs with imports and reloads, because of the complexities involved. Things get even worse with from. Python beginners often encounter this gotcha: after opening a module file in a text edit window, they launch an interactive session to load and test their module with from:

from module import function
function(1, 2, 3)

After finding a bug, they jump back to the edit window, make a change, and try to reload this way:

reload(module)

Except this doesn’t work—the from statement assigned the name function, not module. To refer to the module in a reload, you have to first load it with an import statement, at least once:

import module
reload(module)
function(1, 2, 3)

Except this doesn’t quite work either—reload updates the module object, but names like function copied out of the module in the past still refer to old objects (in this case, the original version of the function). To really get the new function, either call it module.function after the reload, or rerun the from:

import module
reload(module)
from module import function
function(1, 2, 3)

And now, the new version of the function finally runs. But there are problems inherent in using reload with from; not only do you have to remember to reload after imports, you also have to remember to rerun your from statements after reloads; this is complex enough to even trip up an expert once in a while.

You should not expect reload and from to play together nicely. Better yet, don’t combine them at all—use reload with import, or launch programs other ways, as suggested in Chapter 3 (e.g., use the Edit/Runscript option in IDLE, file icon clicks, or system command lines).

reload Isn’t Applied Transitively

When you reload a module, Python only reloads that particular module’s file; it doesn’t automatically reload modules that the file being reloaded happens to import. For example, if you reload some module A, and A imports modules B and C, the reload only applies to A, not B and C. The statements inside A that import B and C are rerun during the reload, but they’ll just fetch the already loaded B and C module objects (assuming they’ve been imported before). In actual code, here’s file A.py:

import B              # Not reloaded when A is
import C              # Just an import of an already loaded module

% python
>>> . . . 
>>> reload(A)

Don’t depend on transitive module reloads. Use multiple reload calls to update subcomponents independently. If desired, you can design your systems to reload their subcomponents automatically by adding reload calls in parent modules like A.

Better still, you could write a general tool to do transitive reloads automatically, by scanning module __dict__s (see Section 18.6.1 earlier in this chapter), and checking each item’s type( ) (see Chapter 7) to find nested modules to reload recursively. Such a utility function could call itself, recursively, to navigate arbitrarily shaped import dependency chains.

Module reloadall.py, listed below, has a reload_all function that automatically reloads a module, every module that the module imports, and so on, all the way to the bottom of the import chains. It uses a dictionary to keep track of modules already reloaded, recursion to walk the import chains, and the standard library’s types module (introduced at the end of Chapter 7), which simply predefines type( ) result for built-in types.

To use this utility, import its reload_all function, and pass it the name of an already-loaded module, much like the built-in reload function; when the file runs stand-alone, its self-test code tests itself—it has to import itself, because its own name is not defined in the file without an import. We encourage you to study and experiment with this example on your own:

import types

def status(module):
    print 'reloading', module.__name__

def transitive_reload(module, visited):
    if not visited.has_key(module):              # Trap cycles, dups.
        status(module)                           # Reload this module
        reload(module)                           # and visit children.
        visited[module] = None
        for attrobj in module.__dict__.values(  ):    # For all attrs
            if type(attrobj) == types.ModuleType:    # Recur if module
                transitive_reload(attrobj, visited)
        
def reload_all(*args):
    visited = {  }
    for arg in args:
        if type(arg) == types.ModuleType:
            transitive_reload(arg, visited)

if __name__ == '__main__':
    import reloadall                # Test code: reload myself
    reload_all(reloadall)           # Should reload this, types

Part V Exercises

See Section B.5 for the solutions.

  1. Basics, import. Write a program that counts lines and characters in a file (similar in spirit to “wc” on Unix). With your text editor, code a Python module called mymod.py, which exports three top-level names:

    • A countLines(name) function that reads an input file and counts the number of lines in it (hint: file.readlines( ) does most of the work for you, and len does the rest)

    • A countChars(name) function that reads an input file and counts the number of characters in it (hint: file.read( ) returns a single string)

    • A test(name) function that calls both counting functions with a given input filename. Such a filename generally might be passed-in, hardcoded, input with raw_input, or pulled from a command line via the sys.argv list; for now, assume it’s a passed-in function argument.

    All three mymod functions should expect a filename string to be passed in. If you type more than two or three lines per function, you’re working much too hard—use the hints listed above!

    Next, test your module interactively, using import and name qualification to fetch your exports. Does your PYTHONPATH need to include the directory where you created mymod.py? Try running your module on itself: e.g., test("mymod.py"). Note that test opens the file twice; if you’re feeling ambitious, you may be able to improve this by passing an open file object into the two count functions (hint: file.seek(0) is a file rewind).

  2. from/from*. Test your mymod module from Exercise 1 interactively, by using from to load the exports directly, first by name, then using the from* variant to fetch everything.

  3. __main__. Add a line in your mymod module that calls the test function automatically only when the module is run as a script, not when it is imported. The line you add will probably test the value of __name__ for the string "__main__“, as shown in this chapter. Try running your module from the system command line; then, import the module and test its functions interactively. Does it still work in both modes?

  4. Nested imports. Write a second module, myclient.py, which imports mymod and tests its functions; run myclient from the system command line. If myclient uses from to fetch from mymod, will mymod’s functions be accessible from the top level of myclient? What if it imports with import instead? Try coding both variations in myclient and test interactively, by importing myclient and inspecting its __dict__.

  5. Package imports. Import your file from a package. Create a subdirectory called mypkg nested in a directory on your module import search path, move the mymod.py module file you created in Exercise 1 or 3 into the new directory, and try to import it with a package import of the form: import mypkg.mymod.

    You’ll need to add an __init__.py file in the directory your module was moved to in order to make this go, but it should work on all major Python platforms (that’s part of the reason Python uses “.” as a path separator). The package directory you create can be simply a subdirectory of the one you’re working in; if it is, it will be found via the home directory component of the search path, and you won’t have to configure your path. Add some code to your __init__.py, and see if it runs on each import.

  6. Reload. Experiment with module reloads: perform the tests in Chapter 16s changer.py example, changing the called function’s message and/or behavior repeatedly, without stopping the Python interpreter. Depending on your system, you might be able to edit changer in another window, or suspend the Python interpreter and edit in the same window (on Unix, a Ctrl-Z key combination usually suspends the current process, and a fg command later resumes it).

  7. Circular imports.[3] In the section on recursive import gotchas, importing recur1 raised an error. But if you restart Python and import recur2 interactively, the error doesn’t occur: test and see this for yourself. Why do you think it works to import recur2, but not recur1? (Hint: Python stores new modules in the built-in sys.modules table (a dictionary) before running their code; later imports fetch the module from this table first, whether the module is “complete” yet or not.) Now try running recur1 as a top-level script file: % python recur1.py. Do you get the same error that occurs when recur1 is imported interactively? Why? (Hint: when modules are run as programs they aren’t imported, so this case has the same effect as importing recur2 interactively; recur2 is the first module imported.) What happens when you run recur2 as a script?



[1] Notice that because a function can access its enclosing module by going through the sys.modules table like this, it’s possible to emulate the effect of the global statement you met in Chapter 13. For instance, the effect of global X; X=0 can be simulated by saying, inside a function: import sys; glob=sys.modules[__name__]; glob.X=0 (albeit with much more typing). Remember, each module gets a __name__ attribute for free; it’s visible as a global name inside functions within a module. This trick provides another way to change both local and global variables of the same name, inside a function.

[2] Tools such as mydir.listing can be preloaded into the interactive namespace, by importing them in the file referenced by the PYTHONSTARTUP environment variable. Since code in the startup file runs in the interactive namespace (module __main__), imports of common tools in the startup file can save you some typing. See Appendix A for more details.

[3] Note that circular imports are extremely rare in practice. In fact, this author has never coded or come across a circular import in a decade of Python coding. On the other hand, if you can understand why it’s a potential problem, you know a lot about Python’s import semantics.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset