When discussing the architecture of software libraries, it is of great use to relate a conceptual overview to concrete software components. This not only increases the immediate knowledge with more definite context, but also provides a foundation for a quicker learning process during future explorations. Let's examine the modules in the matplotlib Python package.
Start by obtaining the IPython Notebook for this chapter, installing the dependencies, starting up the IPython server, and loading the notebook in your browser in the following way:
$ git clone https://github.com/masteringmatplotlib/architecture.git $ cd architecture $ make
Once the notebook is loaded, go ahead and run the initial setup commands:
In [1]: import matplotlib matplotlib.use('nbagg') %matplotlib inline
Now let's create two sets of imports, one for our dependencies and the other for modules that we've created specifically for this notebook:
In [2]: from glob import glob from modulefinder import Module from modulefinder import ModuleFinder from os.path import dirname from pprint import pprint import sys import trace import urllib.request import matplotlib.pyplot as plt from IPython.core.display import Image from pycallgraph import Config from pycallgraph import GlobbingFilter from pycallgraph import PyCallGraph from pycallgraph.output import GraphvizOutput In [3]: sys.path.append("../lib") from modarch import matplotlib_groupings import modfind import modgraph from modutil import ls, rm
Next, let's take a look at matplotlib's top-level Python modules (output elided for compactness):
In [4]: libdir = "../.venv/lib/python3.4/site-packages/matplotlib" ls(libdir) ['matplotlib/__init__.py', 'matplotlib/_cm.py', 'matplotlib/_mathtext_data.py', 'matplotlib/_pylab_helpers.py', 'matplotlib/afm.py', 'matplotlib/animation.py', 'matplotlib/artist.py', ]
There are about 60 top-level modules in the resultant listing. This can be seen using the following command lines:
In [5]: toplevel = glob(libdir + "/*.py") modules = ["matplotlib" + x.split(libdir)[1] for x in toplevel] len(modules)Out[5]: 59
Some of these modules should be pretty familiar to you now:
artist.py
backend_bases.py
figure.py
lines.py
pyplot.py
text.py
You can get a nicer display of these modules with the following:
In [6]: pprint(modules)
To see matplotlib's subpackages, run the following code:
In [7]: from os.path import dirname modfile = "/__init__.py" subs = [dirname(x) for x in glob(libdir + "/*" + modfile)] pprint(["matplotlib" + x.split(libdir)[1] for x in subs]) ['matplotlib/axes', 'matplotlib/backends', 'matplotlib/compat', 'matplotlib/delaunay', 'matplotlib/projections', 'matplotlib/sphinxext', 'matplotlib/style', 'matplotlib/testing', 'matplotlib/tests', 'matplotlib/tri']
The backends
directory contains all the modules that support the user interface and hardcopy backends. The axes
and projections
directories form a crucial part of the artist layer. This brings up a point worth clarifying—there is no correlation in matplotlib code between the software (modules, subpackages, classes, and so on) and the architectural layers that we discussed. One is focused on the nuts and bolts of a plotting library and the other is concerned with helping us conceptually organize functional areas of the library.
That being said, there's no reason why we can't create a mapping. In fact, we did just that in the utility module for this notebook. If you execute the next set of commands in the IPython Notebook, you can see how we classified the matplotlib modules and subpackages (again, the output has been elided for compactness):
In [9]: pprint(matplotlib_groupings) {'artist layer': ['matplotlib.afm', 'matplotlib.animation', 'matplotlib.artist', ...], 'backend layer': ['matplotlib.backend', 'matplotlib.blocking', 'matplotlib.dviread', ...], 'configuration': ['matplotlib.rcsetup', 'matplotlib.style'], 'scripting layer': ['matplotlib.mlab', 'matplotlib.pylab', 'matplotlib.pyplot'], 'utilities': ['matplotlib.bezier', 'matplotlib.cbook', 'mpl_tool']}
Note that not all strings in the key/list pairs exactly match matplotlib's modules or subpackages. This is so because the strings in the preceding data structure are used to match the beginnings of the module names and subpackages. Their intended use is in a call, such as x.startswith(mod_name_part)
.
We will use this data structure later in this section when building organized graphs of matplotlib imports. However for now, this offers additional insight into how one can view the Python modules that comprise matplotlib.
The previous section showed us what the modules look like on the filesystem
(as interpreted by Python, of course). Next we're going to see what happens when we import these modules and how this relates to the architecture of matplotlib.
Continuing with the same notebook session in your browser, execute the following command lines:
In [10]: #! /usr/bin/env python3.4 import matplotlib.pyplot as plt def main () -> None: plt.plot([1,2,3,4]) plt.ylabel('some numbers') plt.savefig('simple-line.png') if __name__ == '__main__': main()
These command lines are taken from the script in the repository saved in scripts/simple-line.py
. As its name suggests (and as you will see when entering the preceding code into the IPython Notebook), this bit of matplotlib code draws a simple line on an axis. The idea here is to load a very simple matplotlib script so that we can examine matplotlib internals without distraction.
The first thing this script does is import the matplotlib scripting layer, and it's the import that we are interested in. So let's start digging.
The Python standard library provides an excellent tool to examine imports—the modulefinder
module. Let's take the default finder for a spin in the same notebook session:
In [11]: finder = ModuleFinder() finder.run_script('../scripts/simple-line.py') In [12]: len(finder.modules) Out[12]: 1068
Running the script for the first time and examining all the imports will take a few seconds. If you take a look at the data in finder.modules
, you will see modules that are from not only matplotlib and NumPy, but also IPython, ZeroMQ, setuptools, Tornado, and the Python standard library.
We're only interested in matplotlib. So we need to create a custom finder that gives us just what we're looking for. Of course we did just that and saved it in the modfind
module.
Skipping ahead a bit in the notebook, we will use our customer finder in exactly the same way as the one in the standard library:
In [16]: finder = modfind.CustomFinder() finder.run_script('../scripts/simple-line.py') len(finder.modules) Out[16]: 62
That's much more manageable. One of the key things that the ModuleFinder
does is keep track of which modules import which other modules. As such, once finder
has run the given script, it has data on all the relationships between the modules that import other modules (and each other). This type of data is perfectly suited for graph data structures. It just so happens that this is something that matplotlib is able to work with as well, thanks to the NetworkX library and its matplotlib integration.
In addition to CustomFinder
, this notebook also has a class called
ModGrapher
. This module does the following:
CustomFinder
and runs itDue to the second bullet point, it is clear that the ModGrapher
provides visualization for the usage and the extent to which one module is imported by another module.
Let's use ModGrapher
to generate the import data (by using CustomGrapher
behind the scenes) and then display a graph of the import
relationships:
In [17]: grapher = modgraph.ModGrapher( source='../scripts/simple-line.py', layout='neato') grapher.render()
The following is the graph of the import
relationships:
As you can see, the result looks somewhat chaotic. Even so, we are provided with useful meta information. A bit of a heads-up—when you start digging into the matplotlib code earnestly, you can expect the code in any given module to use classes and functions across the entire matplotlib code base.
However, it would be nice to see more structure in the relationships. This is where our use of the previously mentioned modarch.matplotlib_groupings
comes in. We have at our disposal a data structure that maps the matplotlib modules to the various layers of the matplotlib architecture. There is a convenient function in modarch
that does this, and the ModGrapher
class uses this function in several of its methods to group imports according to the matplotlib architecture that we defined.
Let's try the simplest method first, re-rendering the graph with a different mode:
In [21]: grapher.render(mode="reduced-structure")
The following figure is the result of the preceding command:
The chaos is gone, but so are the interesting features. What we need is a combination of the two—something that shows the various modules that are imported as well as the overall usage of the architectural elements. All that is required is that you ensure that the imports of any one area of matplotlib's architecture that go outside the group terminate inside the group instead of crossing into the other groups (otherwise, we'd end up with the same graph that we started with).
This too has been coded in our module, and we just need to use the appropriate mode to render it:
In [22]: grapher.render(layout="neato", labels=True, mode="simple-structure")
The following figure is the result of the preceding command:
The code behind this graph does some additional simplification—it only goes two levels deep in the matplotlib namespace. For instance, matplotlib.a.b.c
will be rolled up (with its weights contributing) into matplotlib.a
. There is an additional mode, full-structure, which you can use to see all the imported matplotlib modules, as mapped to the architectural areas.
This brings us to the end of the our exploration of matplotlib's modules and module imports. Next, we will take a look at the architecture as reflected in the running code.