So far, when we’ve imported a module, we’ve been loading files. This represents typical module usage, and is what you will probably use for most imports you’ll code early on in your Python career. The module import story is a bit more rich than we have thus far implied.
Imports can name a directory path, in addition to a module name. When they do, they are known as package imports—a directory of Python code is said to be a package. This is a somewhat advanced feature, but turns out to be handy for organizing the files in a large system, and tends to simplify module search path settings. As we’ll see, package imports are also sometimes required in order to resolve ambiguities when multiple programs are installed on a single machine.
Here’s how package imports
work. In the place where we have been naming a simple file in
import
statements, we can instead list a path of
names separated by periods:
import dir1.dir2.mod
The same goes for from
statements:
from dir1.dir2.mod import x
The “dotted” path in these
statements is assumed to correspond to a path through the directory
hierarchy on your machine, leading to the file
mod.py (or other file type). That is, there is
directory dir1
, which has a subdirectory
dir2
, which contains a module file
mod.py (or other suffix).
Furthermore, these imports imply that dir1
resides
within some container directory dir0
, which is
accessible on the Python module search path. In other words, the two
import statements imply a directory structure that looks something
like this (shown with DOS backslash separators):
dir0dir1dir2mod.py # Or mod.pyc,mod.so,...
The container directory dir0
still needs to be
added to your module search path (unless it’s the
home directory of the top-level file), exactly as if
dir1
were a module file. From there down the
import statements in your script give the directory path leading to
the module explicitly.
If you use this feature, keep
in mind that the directory paths in
your import statements can only be variables separated by periods.
You cannot use any platform-specific path syntax in your import
statements; things like C:dir1
, My
Documents.dir2
, and ../dir1
, do not work
syntactically. Instead, use platform-specific syntax in your module
search path settings to name the container directory.
For instance, in the prior example, dir0
—the
directory name you add to your module search path—can be an
arbitrarily long and platform-specific directory path leading up to
dir1
. Instead of using an invalid statement like
this:
import C:mycodedir1dir2mod # Error: illegal syntax
add C:mycode
to your
PYTHONPATH
variable or .pth
files, unless it is the program’s home directory,
and say this:
import dir1.dir2.mod
In effect, entries on the module search path provide platform-specific directory path prefixes, which lead to the leftmost names in import statements. Import statements provide directory path tails in a platform neutral fashion.[1]
If you choose to use package imports, there is one more constraint
you must follow. Each directory named within the path of a package
import statement must also contain a file named
__init__.py, or else your package imports will fail. In the
example we’ve been using, both
dir1
and dir2
must contain a
file called __init__.py; the container
directory dir0
does not require such a file,
because it’s not listed in the import statement
itself. More formally, for a directory structure such as:
dir0dir1dir2mod.py
and an import statement of the form:
import dir1.dir2.mod
the following rules apply:
dir1
and dir2
both must contain
an __init__.py file.
dir0
, the container, does not require an
__init__.py; it will simply be ignored if
present.
dir0
must be listed on the module search path
(home directory, PYTHONPATH
, etc.), not
dir0dir1
.
The net effect is that this example’s directory structure should be as follows, with indentation designating directory nesting:
dir0 # Container on module search path
dir1
__init__.py
dir2
__init__.py
mod.py
These __init__.py files contain Python code, just like normal module files. They are partly present as a declaration to Python, and can be completely empty. As a declaration, these files serve to prevent directories with a common name from unintentionally hiding true modules that occur later on the module search path. Otherwise, Python may pick a directory that has nothing to do with your code, just because it appears in an earlier directory on the search path.
More generally, this file serves as a hook for package
initialization-time actions, serves to generate a module namespace
for a directory, and implements the behavior of
from*
(i.e., from ... import *
)
statements when used with directory imports:
The first time Python imports through a directory, it automatically runs all the code in the directory’s __init__.py file. Because of that, these files are a natural place to put code to initialize the state required by files in the package. For instance, a package might use its initialization file to create required data files, open connections to databases, and so on. Typically, __init__.py files are not meant to be useful if executed directly; they are run automatically during imports, the first time Python goes through a directory.
In the package import model, the directory paths in your script
become real nested object paths after the import. For instance, in
the example above, the expression dir1.dir2
works,
and returns a module object whose namespace contains all the names
assigned by dir2
’s __init__.py file. Such files provide a namespace for
modules that have no other file.
As an advanced feature, you can use __all__
lists in __init__.py files to define what is
exported when a directory is imported with the
from*
statement form. (We’ll meet
__all__
in Chapter 18.) In an
__init__.py file, the __all__
list is taken to be the list of submodule names that
should be imported when from*
is used on the
package (directory) name. If __all__
is not set,
the from*
does not automatically load submodules
nested in the directory, but instead loads just names defined by
assignments in the directory’s __init__.py file, including any submodules explicitly imported by
code in this file. For instance, a statement from submodule
import X
in a directory’s __init__.py makes name X
available in
that directory’s namespace.
You can also simply leave these files empty, if their roles are beyond your needs. They must really exist, though, for your directory imports to work at all.
Let’s actually code the example
we’ve been talking about to show how initialization
files and paths come into play. The following three files are coded
in a directory dir1
and its subdirectory
dir2
:
#File: dir1\__init__.py print 'dir1 init' x = 1 #File: dir1dir2\__init__.py print 'dir2 init' y = 2 #File: dir1dir2mod.py print 'in mod.py' z = 3
Here, dir1
will either be a subdirectory of the
one we’re working in (i.e., the home directory), or
a subdirectory of a directory that is listed on the module search
path (technically, on sys.path
). Either way,
dir1
’s container does not need an
__init__.py file.
As for simple module files, import statements run each
directory’s initialization file as Python descends
the path, the first time a directory is traversed;
we’ve added print
statements to
trace their execution. Also like module files, already-imported
directories may be passed to reload
to force
re-execution of that single item—reload accepts a dotted path
name to reload nested directories and files:
%python
>>>import dir1.dir2.mod # First imports run init files.
dir1 init dir2 init in mod.py >>> >>>import dir1.dir2.mod # Later imports do not.
>>> >>>reload(dir1)
dir1 init <module 'dir1' from 'dir1\__init__.pyc'> >>> >>>reload(dir1.dir2)
dir2 init <module 'dir1.dir2' from 'dir1dir2\__init__.pyc'>
Once imported, the path in your import
statement
becomes a nested object path in your script;
mod
is an object nested in object
dir2
, nested in object dir1
:
>>>dir1
<module 'dir1' from 'dir1\__init__.pyc'> >>>dir1.dir2
<module 'dir1.dir2' from 'dir1dir2\__init__.pyc'> >>>dir1.dir2.mod
<module 'dir1.dir2.mod' from 'dir1dir2mod.pyc'>
In fact, each directory name in the path becomes a variable, assigned
to a module object whose namespace is initialized by all the
assignments in that directory’s __init__.py file. dir1.x
refers to the
variable x
assigned in dir1\__init__.py, much as mod.z
refers to
z
assigned in mod.py:
>>>dir1.x
1 >>>dir1.dir2.y
2 >>>dir1.dir2.mod.z
3
import
statements can be
somewhat inconvenient to use with packages, because you must retype
paths frequently in your program. In the prior
section’s example, you must retype and rerun the
full path from dir1
each time you want to reach
z
. In fact, we get errors here if we try to access
dir2
or mod
directly at this
point.
>>>dir2.mod
NameError: name 'dir2' is not defined >>>mod.z
NameError: name 'mod' is not defined
Because of that, it’s often more convenient to use
the from
statement with packages, to avoid
retyping paths at each access. Perhaps more importantly, if you ever
restructure your directory tree, the from
statement requires just one path update in your code, whereas the
import
may require many. The import
as
extension, discussed in the next chapter, can also help
here, by providing a shorter synonym for the full path:
%python
>>>from dir1.dir2 import mod # Code the path here only.
dir1 init dir2 init in mod.py >>>mod.z # Don't repeat path.
3 >>>from dir1.dir2.mod import z
>>>z
3 >>>import dir1.dir2.mod as mod # Use shorter name.
>>>mod.z
3
If you’re new to Python, make sure that you’ve mastered simple modules before stepping up to packages, as they are a somewhat advanced feature of Python. They do serve useful roles, especially in larger programs: they make imports more informative, serve as an organizational tool, simplify your module search path, and can resolve ambiguities.
First of all, because package imports give some directory information in program files, they both make it easier to locate your files, and serve as an organizational tool. Without package paths, you must resort to consulting the module search to find files more often. Moreover, if you organize your files into subdirectories for functional areas, package imports make it more obvious what role a module plays, and so make your code more readable. For example, a normal import of a file in a directory somewhere on the module search path:
import utilities
bears much less information than an import that includes path information:
import database.client.utilities
Package imports can also greatly simply your
PYTHONPATH
or .pth file
search path settings. In fact, if you use package imports for all
your cross-directory imports, and you make those package imports
relative to a common root directory where all your Python code is
stored, you really only need a single entry on your search path: the
common root.
The only time package imports are actually required, though, is in order to resolve ambiguities that may arise when multiple programs are installed on a single machine. This is something of an install issue, but can also become a concern in general practice. Let’s turn to a hypothetical scenario to illustrate.
Suppose that a programmer develops a Python program that contains a
file called utilities.py for common utility
code, and a top-level file named main.py that
users launch to start the program. All over this program, its files
say import utilities
to load and use the common
code. When this program is shipped, it arrives as a single
tar or zip file containing
all the program’s files; when it is installed, it
unpacks all its files into a single directory named
system1
on the target machine:
system1 utilities.py # Common utility functions, classes main.py # Launch this to start the program. other.py # Import utilities to load my tools
Now, suppose that a second programmer does the same thing: he or she
develops a different program with files
utilities.py and main.py,
and uses import utilities
to load the common code
file again. When this second system is fetched and installed, its
files unpack into a new directory called system2
somewhere on the receiving machine, such that its files do not
overwrite same-named files from the first system. Eventually, both
systems become so popular that they wind up commonly installed in the
same computer:
system2 utilities.py # Common utilities main.py # Launch this to run. other.py # Imports utilities
So far, there’s no problem: both systems can coexist
or run on the same machine. In fact, we don’t even
need to configure the module search path to use these
programs—because Python always searches the home directory
first (that is, the directory containing the top-level file), imports
in either system’s files will automatically see all
the files in that system’s directory. For instance,
if you click on system1main.py, all imports
will search system1
first. Similarly, if you
launch system2main.py, then
system2
is searched first instead. Remember,
module search path settings are only needed to import across
directory boundaries.
But now, suppose that after you’ve installed these two programs on your machine, you decide that you’d like to use code in the utilities.py files of either of the two in a system of your own. It’s common utility code, after all, and Python code by nature wants to be reused. You want to be able to say the following from code that you’re writing in a third directory:
import utilities utilities.func('spam')
to load one of the two files. And now the problem starts to
materialize. To make this work at all, you’ll have
to set the module search path to include the directories containing
the utilities.py files. But which directory do
you put first in the path—system1
or
system2
?
The problem is the linear nature of the search
path; it is always scanned left to right. No matter how long you may
ponder this dilemma, you will always get utilities.py
from the directory listed first
(leftmost) on the search path. As is, you’ll never
be able to import it from the other directory at all. You could try
changing sys.path
within your script before each
import operation, but that’s both extra work, and
highly error-prone. By default, you’re stuck.
And this is the issue that packages actually fix. Rather than installing programs as a flat list of files in standalone directories, package and install them as subdirectories, under a common root. For instance, you might organize all the code in this example as an install hierarchy that looks like this:
root system1 __init__.py utilities.py main.py other.py system2 __init__.py utilities.py main.py other.py system3 # Here or elsewhere __init__.py # Your new code here myfile.py
Now, add just the common root
directory to your
search path. If your code’s imports are relative to
this common root, you can import either
system’s utility file with package imports—the
enclosing directory name makes the path (and hence the module
reference) unique. In fact, you can import both
utility files in the same module, as long as you use the import
statement and repeat the full path each time you reference the
utility modules:
import system1.utilities import system2.utilities system1.utilities.function('spam') system2.utilities.function('eggs')
Notice that __init__.py files were added to
the system1
and system2
directories to make this work, but not to the
root
: only directories listed within import
statements require these files.
Technically, your system3
directory
doesn’t have to be under
root
—just the packages of code from which
you will import. However, because you never know when your own
modules might be useful in other programs, you might as well place
them under the common root to avoid similar name-collision problems
in the future.
Also, notice that both of the two original systems’
imports will keep working as is and unchanged: because their
home directory is searched first, the addition
of the common root on the search path is irrelevent to code in
system1
and system2
. They can
keep saying just import utilities
and expect to
find their own file. Moreover, if you’re careful to
unpack all your Python systems under the common root like this, path
configuration becomes simple: you’ll only need to
add the common root, once.
[1] The dot path
syntax was chosen partly for platform neutrality, but also because
paths in import statements become real nested object paths. This
syntax also means that you get odd error messages if you forget to
omit the .py in your import statements:
import mod.py
is assumed to be a directory path
import—it loads mod.py, then tries to load
a mod py.py, and ultimately issues a
potentially confusing error message.