Chapter 17. Module Packages

So far, when we’ve imported a module, we’ve been loading files. This represents typical module usage, and is what you will probably use for most imports you’ll code early on in your Python career. The module import story is a bit more rich than we have thus far implied.

Imports can name a directory path, in addition to a module name. When they do, they are known as package imports—a directory of Python code is said to be a package. This is a somewhat advanced feature, but turns out to be handy for organizing the files in a large system, and tends to simplify module search path settings. As we’ll see, package imports are also sometimes required in order to resolve ambiguities when multiple programs are installed on a single machine.

Package Import Basics

Here’s how package imports work. In the place where we have been naming a simple file in import statements, we can instead list a path of names separated by periods:

import dir1.dir2.mod

The same goes for from statements:

from dir1.dir2.mod import x

The “dotted” path in these statements is assumed to correspond to a path through the directory hierarchy on your machine, leading to the file mod.py (or other file type). That is, there is directory dir1, which has a subdirectory dir2, which contains a module file mod.py (or other suffix).

Furthermore, these imports imply that dir1 resides within some container directory dir0, which is accessible on the Python module search path. In other words, the two import statements imply a directory structure that looks something like this (shown with DOS backslash separators):

               dir0dir1dir2mod.py             # Or mod.pyc,mod.so,...

The container directory dir0 still needs to be added to your module search path (unless it’s the home directory of the top-level file), exactly as if dir1 were a module file. From there down the import statements in your script give the directory path leading to the module explicitly.

Packages and Search Path Settings

If you use this feature, keep in mind that the directory paths in your import statements can only be variables separated by periods. You cannot use any platform-specific path syntax in your import statements; things like C:dir1, My Documents.dir2, and ../dir1, do not work syntactically. Instead, use platform-specific syntax in your module search path settings to name the container directory.

For instance, in the prior example, dir0—the directory name you add to your module search path—can be an arbitrarily long and platform-specific directory path leading up to dir1. Instead of using an invalid statement like this:

import C:mycodedir1dir2mod      # Error: illegal syntax

add C:mycode to your PYTHONPATH variable or .pth files, unless it is the program’s home directory, and say this:

import dir1.dir2.mod

In effect, entries on the module search path provide platform-specific directory path prefixes, which lead to the leftmost names in import statements. Import statements provide directory path tails in a platform neutral fashion.[1]

Package __init__.py Files

If you choose to use package imports, there is one more constraint you must follow. Each directory named within the path of a package import statement must also contain a file named __init__.py, or else your package imports will fail. In the example we’ve been using, both dir1 and dir2 must contain a file called __init__.py; the container directory dir0 does not require such a file, because it’s not listed in the import statement itself. More formally, for a directory structure such as:

dir0dir1dir2mod.py

and an import statement of the form:

import dir1.dir2.mod

the following rules apply:

  • dir1 and dir2 both must contain an __init__.py file.

  • dir0, the container, does not require an __init__.py; it will simply be ignored if present.

  • dir0 must be listed on the module search path (home directory, PYTHONPATH, etc.), not dir0dir1.

The net effect is that this example’s directory structure should be as follows, with indentation designating directory nesting:

                  dir0                       # Container on module search path
    dir1
        __init__.py
        dir2
            __init__.py
            mod.py

These __init__.py files contain Python code, just like normal module files. They are partly present as a declaration to Python, and can be completely empty. As a declaration, these files serve to prevent directories with a common name from unintentionally hiding true modules that occur later on the module search path. Otherwise, Python may pick a directory that has nothing to do with your code, just because it appears in an earlier directory on the search path.

More generally, this file serves as a hook for package initialization-time actions, serves to generate a module namespace for a directory, and implements the behavior of from* (i.e., from ... import *) statements when used with directory imports:

Package initialization

The first time Python imports through a directory, it automatically runs all the code in the directory’s __init__.py file. Because of that, these files are a natural place to put code to initialize the state required by files in the package. For instance, a package might use its initialization file to create required data files, open connections to databases, and so on. Typically, __init__.py files are not meant to be useful if executed directly; they are run automatically during imports, the first time Python goes through a directory.

Module namespace initialization

In the package import model, the directory paths in your script become real nested object paths after the import. For instance, in the example above, the expression dir1.dir2 works, and returns a module object whose namespace contains all the names assigned by dir2’s __init__.py file. Such files provide a namespace for modules that have no other file.

From* statement behavior

As an advanced feature, you can use __all__ lists in __init__.py files to define what is exported when a directory is imported with the from* statement form. (We’ll meet __all__ in Chapter 18.) In an __init__.py file, the __all__ list is taken to be the list of submodule names that should be imported when from* is used on the package (directory) name. If __all__ is not set, the from* does not automatically load submodules nested in the directory, but instead loads just names defined by assignments in the directory’s __init__.py file, including any submodules explicitly imported by code in this file. For instance, a statement from submodule import X in a directory’s __init__.py makes name X available in that directory’s namespace.

You can also simply leave these files empty, if their roles are beyond your needs. They must really exist, though, for your directory imports to work at all.

Package Import Example

Let’s actually code the example we’ve been talking about to show how initialization files and paths come into play. The following three files are coded in a directory dir1 and its subdirectory dir2:

#File: dir1\__init__.py
print 'dir1 init'
x = 1

#File: dir1dir2\__init__.py
print 'dir2 init'
y = 2

#File: dir1dir2mod.py
print 'in mod.py'
z = 3

Here, dir1 will either be a subdirectory of the one we’re working in (i.e., the home directory), or a subdirectory of a directory that is listed on the module search path (technically, on sys.path). Either way, dir1’s container does not need an __init__.py file.

As for simple module files, import statements run each directory’s initialization file as Python descends the path, the first time a directory is traversed; we’ve added print statements to trace their execution. Also like module files, already-imported directories may be passed to reload to force re-execution of that single item—reload accepts a dotted path name to reload nested directories and files:

% python 
>>> import dir1.dir2.mod      # First imports run init files.
dir1 init
dir2 init
in mod.py
>>>
>>> import dir1.dir2.mod      # Later imports do not.
>>>
>>> reload(dir1)
dir1 init
<module 'dir1' from 'dir1\__init__.pyc'>
>>>
>>> reload(dir1.dir2)
dir2 init
<module 'dir1.dir2' from 'dir1dir2\__init__.pyc'>

Once imported, the path in your import statement becomes a nested object path in your script; mod is an object nested in object dir2, nested in object dir1:

>>> dir1
<module 'dir1' from 'dir1\__init__.pyc'>
>>> dir1.dir2
<module 'dir1.dir2' from 'dir1dir2\__init__.pyc'>
>>> dir1.dir2.mod
<module 'dir1.dir2.mod' from 'dir1dir2mod.pyc'>

In fact, each directory name in the path becomes a variable, assigned to a module object whose namespace is initialized by all the assignments in that directory’s __init__.py file. dir1.x refers to the variable x assigned in dir1\__init__.py, much as mod.z refers to z assigned in mod.py:

>>> dir1.x
1
>>> dir1.dir2.y
2
>>> dir1.dir2.mod.z
3

from Versus import with Packages

import statements can be somewhat inconvenient to use with packages, because you must retype paths frequently in your program. In the prior section’s example, you must retype and rerun the full path from dir1 each time you want to reach z. In fact, we get errors here if we try to access dir2 or mod directly at this point.

>>> dir2.mod
NameError: name 'dir2' is not defined
>>> mod.z
NameError: name 'mod' is not defined

Because of that, it’s often more convenient to use the from statement with packages, to avoid retyping paths at each access. Perhaps more importantly, if you ever restructure your directory tree, the from statement requires just one path update in your code, whereas the import may require many. The import as extension, discussed in the next chapter, can also help here, by providing a shorter synonym for the full path:

% python
>>> from dir1.dir2 import mod       # Code the path here only.
dir1 init
dir2 init
in mod.py
>>> mod.z                           # Don't repeat path.
3
>>> from dir1.dir2.mod import z
>>> z
3
>>> import dir1.dir2.mod as mod     # Use shorter name.
>>> mod.z
3

Why Use Package Imports?

If you’re new to Python, make sure that you’ve mastered simple modules before stepping up to packages, as they are a somewhat advanced feature of Python. They do serve useful roles, especially in larger programs: they make imports more informative, serve as an organizational tool, simplify your module search path, and can resolve ambiguities.

First of all, because package imports give some directory information in program files, they both make it easier to locate your files, and serve as an organizational tool. Without package paths, you must resort to consulting the module search to find files more often. Moreover, if you organize your files into subdirectories for functional areas, package imports make it more obvious what role a module plays, and so make your code more readable. For example, a normal import of a file in a directory somewhere on the module search path:

import utilities

bears much less information than an import that includes path information:

import database.client.utilities

Package imports can also greatly simply your PYTHONPATH or .pth file search path settings. In fact, if you use package imports for all your cross-directory imports, and you make those package imports relative to a common root directory where all your Python code is stored, you really only need a single entry on your search path: the common root.

A Tale of Three Systems

The only time package imports are actually required, though, is in order to resolve ambiguities that may arise when multiple programs are installed on a single machine. This is something of an install issue, but can also become a concern in general practice. Let’s turn to a hypothetical scenario to illustrate.

Suppose that a programmer develops a Python program that contains a file called utilities.py for common utility code, and a top-level file named main.py that users launch to start the program. All over this program, its files say import utilities to load and use the common code. When this program is shipped, it arrives as a single tar or zip file containing all the program’s files; when it is installed, it unpacks all its files into a single directory named system1 on the target machine:

system1
    utilities.py        # Common utility functions, classes
    main.py             # Launch this to start the program.
    other.py            # Import utilities to load my tools

Now, suppose that a second programmer does the same thing: he or she develops a different program with files utilities.py and main.py, and uses import utilities to load the common code file again. When this second system is fetched and installed, its files unpack into a new directory called system2 somewhere on the receiving machine, such that its files do not overwrite same-named files from the first system. Eventually, both systems become so popular that they wind up commonly installed in the same computer:

system2
    utilities.py        # Common utilities
    main.py             # Launch this to run.
    other.py            # Imports utilities

So far, there’s no problem: both systems can coexist or run on the same machine. In fact, we don’t even need to configure the module search path to use these programs—because Python always searches the home directory first (that is, the directory containing the top-level file), imports in either system’s files will automatically see all the files in that system’s directory. For instance, if you click on system1main.py, all imports will search system1 first. Similarly, if you launch system2main.py, then system2 is searched first instead. Remember, module search path settings are only needed to import across directory boundaries.

But now, suppose that after you’ve installed these two programs on your machine, you decide that you’d like to use code in the utilities.py files of either of the two in a system of your own. It’s common utility code, after all, and Python code by nature wants to be reused. You want to be able to say the following from code that you’re writing in a third directory:

import utilities
utilities.func('spam')

to load one of the two files. And now the problem starts to materialize. To make this work at all, you’ll have to set the module search path to include the directories containing the utilities.py files. But which directory do you put first in the path—system1 or system2?

The problem is the linear nature of the search path; it is always scanned left to right. No matter how long you may ponder this dilemma, you will always get utilities.py from the directory listed first (leftmost) on the search path. As is, you’ll never be able to import it from the other directory at all. You could try changing sys.path within your script before each import operation, but that’s both extra work, and highly error-prone. By default, you’re stuck.

And this is the issue that packages actually fix. Rather than installing programs as a flat list of files in standalone directories, package and install them as subdirectories, under a common root. For instance, you might organize all the code in this example as an install hierarchy that looks like this:

root
    system1
        __init__.py
        utilities.py
        main.py
        other.py
    system2
        __init__.py
        utilities.py
        main.py
        other.py
    system3                 # Here or elsewhere
        __init__.py           # Your new code here
        myfile.py

Now, add just the common root directory to your search path. If your code’s imports are relative to this common root, you can import either system’s utility file with package imports—the enclosing directory name makes the path (and hence the module reference) unique. In fact, you can import both utility files in the same module, as long as you use the import statement and repeat the full path each time you reference the utility modules:

import system1.utilities
import system2.utilities
system1.utilities.function('spam')
system2.utilities.function('eggs')

Notice that __init__.py files were added to the system1 and system2 directories to make this work, but not to the root: only directories listed within import statements require these files.

Technically, your system3 directory doesn’t have to be under root—just the packages of code from which you will import. However, because you never know when your own modules might be useful in other programs, you might as well place them under the common root to avoid similar name-collision problems in the future.

Also, notice that both of the two original systems’ imports will keep working as is and unchanged: because their home directory is searched first, the addition of the common root on the search path is irrelevent to code in system1 and system2. They can keep saying just import utilities and expect to find their own file. Moreover, if you’re careful to unpack all your Python systems under the common root like this, path configuration becomes simple: you’ll only need to add the common root, once.



[1] The dot path syntax was chosen partly for platform neutrality, but also because paths in import statements become real nested object paths. This syntax also means that you get odd error messages if you forget to omit the .py in your import statements: import mod.py is assumed to be a directory path import—it loads mod.py, then tries to load a mod py.py, and ultimately issues a potentially confusing error message.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset