matplotlib modules

When discussing the architecture of software libraries, it is of great use to relate a conceptual overview to concrete software components. This not only increases the immediate knowledge with more definite context, but also provides a foundation for a quicker learning process during future explorations. Let's examine the modules in the matplotlib Python package.

Exploring the filesystem

Start by obtaining the IPython Notebook for this chapter, installing the dependencies, starting up the IPython server, and loading the notebook in your browser in the following way:

$ git clone https://github.com/masteringmatplotlib/architecture.git
$ cd architecture
$ make

Once the notebook is loaded, go ahead and run the initial setup commands:

In [1]: import matplotlib
        matplotlib.use('nbagg')
        %matplotlib inline

Now let's create two sets of imports, one for our dependencies and the other for modules that we've created specifically for this notebook:

In [2]: from glob import glob
        from modulefinder import Module
        from modulefinder import ModuleFinder
        from os.path import dirname
        from pprint import pprint
        import sys
        import trace
        import urllib.request

        import matplotlib.pyplot as plt
        from IPython.core.display import Image

        from pycallgraph import Config
        from pycallgraph import GlobbingFilter
        from pycallgraph import PyCallGraph
        from pycallgraph.output import GraphvizOutput
In [3]: sys.path.append("../lib")
        from modarch import matplotlib_groupings
        import modfind
        import modgraph
        from modutil import ls, rm

Next, let's take a look at matplotlib's top-level Python modules (output elided for compactness):

In [4]: libdir = "../.venv/lib/python3.4/site-packages/matplotlib"

        ls(libdir)
        ['matplotlib/__init__.py',
         'matplotlib/_cm.py',
         'matplotlib/_mathtext_data.py',
         'matplotlib/_pylab_helpers.py',
         'matplotlib/afm.py',
         'matplotlib/animation.py',
         'matplotlib/artist.py',
          ]

There are about 60 top-level modules in the resultant listing. This can be seen using the following command lines:

In [5]: toplevel = glob(libdir + "/*.py")
        modules = ["matplotlib" + x.split(libdir)[1]
                   for x in toplevel]
        len(modules)Out[5]: 59

Some of these modules should be pretty familiar to you now:

  • artist.py
  • backend_bases.py
  • figure.py
  • lines.py
  • pyplot.py
  • text.py

You can get a nicer display of these modules with the following:

In [6]: pprint(modules)

To see matplotlib's subpackages, run the following code:

In [7]: from os.path import dirname

        modfile = "/__init__.py"
        subs = [dirname(x) for x in glob(libdir + "/*" + modfile)]
        pprint(["matplotlib" + x.split(libdir)[1] for x in subs])

        ['matplotlib/axes',
         'matplotlib/backends',
         'matplotlib/compat',
         'matplotlib/delaunay',
         'matplotlib/projections',
         'matplotlib/sphinxext',
         'matplotlib/style',
         'matplotlib/testing',
         'matplotlib/tests',
         'matplotlib/tri']

The backends directory contains all the modules that support the user interface and hardcopy backends. The axes and projections directories form a crucial part of the artist layer. This brings up a point worth clarifying—there is no correlation in matplotlib code between the software (modules, subpackages, classes, and so on) and the architectural layers that we discussed. One is focused on the nuts and bolts of a plotting library and the other is concerned with helping us conceptually organize functional areas of the library.

That being said, there's no reason why we can't create a mapping. In fact, we did just that in the utility module for this notebook. If you execute the next set of commands in the IPython Notebook, you can see how we classified the matplotlib modules and subpackages (again, the output has been elided for compactness):

In [9]: pprint(matplotlib_groupings)

        {'artist layer': ['matplotlib.afm',
                          'matplotlib.animation',
                          'matplotlib.artist',
                          ...],
         'backend layer': ['matplotlib.backend',
                           'matplotlib.blocking',
                           'matplotlib.dviread',
                           ...],
         'configuration': ['matplotlib.rcsetup',
                           'matplotlib.style'],
         'scripting layer': ['matplotlib.mlab',
                             'matplotlib.pylab',
                             'matplotlib.pyplot'],
         'utilities': ['matplotlib.bezier',
                       'matplotlib.cbook',
                       'mpl_tool']}

Note that not all strings in the key/list pairs exactly match matplotlib's modules or subpackages. This is so because the strings in the preceding data structure are used to match the beginnings of the module names and subpackages. Their intended use is in a call, such as x.startswith(mod_name_part).

We will use this data structure later in this section when building organized graphs of matplotlib imports. However for now, this offers additional insight into how one can view the Python modules that comprise matplotlib.

Exploring imports visually

The previous section showed us what the modules look like on the filesystem (as interpreted by Python, of course). Next we're going to see what happens when we import these modules and how this relates to the architecture of matplotlib.

Continuing with the same notebook session in your browser, execute the following command lines:

In [10]: #! /usr/bin/env python3.4
         import matplotlib.pyplot as plt

         def main () -> None:
             plt.plot([1,2,3,4])
             plt.ylabel('some numbers')
             plt.savefig('simple-line.png')

         if __name__ == '__main__':
             main()

These command lines are taken from the script in the repository saved in scripts/simple-line.py. As its name suggests (and as you will see when entering the preceding code into the IPython Notebook), this bit of matplotlib code draws a simple line on an axis. The idea here is to load a very simple matplotlib script so that we can examine matplotlib internals without distraction.

The first thing this script does is import the matplotlib scripting layer, and it's the import that we are interested in. So let's start digging.

ModuleFinder

The Python standard library provides an excellent tool to examine imports—the modulefinder module. Let's take the default finder for a spin in the same notebook session:

In [11]:  finder = ModuleFinder()
         finder.run_script('../scripts/simple-line.py')

In [12]: len(finder.modules)
Out[12]: 1068

Running the script for the first time and examining all the imports will take a few seconds. If you take a look at the data in finder.modules, you will see modules that are from not only matplotlib and NumPy, but also IPython, ZeroMQ, setuptools, Tornado, and the Python standard library.

We're only interested in matplotlib. So we need to create a custom finder that gives us just what we're looking for. Of course we did just that and saved it in the modfind module.

Skipping ahead a bit in the notebook, we will use our customer finder in exactly the same way as the one in the standard library:

In [16]: finder = modfind.CustomFinder()
        finder.run_script('../scripts/simple-line.py')
        len(finder.modules)
Out[16]: 62

That's much more manageable. One of the key things that the ModuleFinder does is keep track of which modules import which other modules. As such, once finder has run the given script, it has data on all the relationships between the modules that import other modules (and each other). This type of data is perfectly suited for graph data structures. It just so happens that this is something that matplotlib is able to work with as well, thanks to the NetworkX library and its matplotlib integration.

ModGrapher

In addition to CustomFinder, this notebook also has a class called ModGrapher. This module does the following:

  • Creates an instance of CustomFinder and runs it
  • Builds weight values for nodes based on the number of times a module is imported
  • Colors nodes based on the similarity of names (more or less)
  • Provides several ways to refine the relationships between imported modules
  • Draws configured graphs using NetworkX and matplotlib

Due to the second bullet point, it is clear that the ModGrapher provides visualization for the usage and the extent to which one module is imported by another module.

Let's use ModGrapher to generate the import data (by using CustomGrapher behind the scenes) and then display a graph of the import relationships:

In [17]: grapher = modgraph.ModGrapher(
             source='../scripts/simple-line.py',
             layout='neato')
         grapher.render()

The following is the graph of the import relationships:

ModGrapher

As you can see, the result looks somewhat chaotic. Even so, we are provided with useful meta information. A bit of a heads-up—when you start digging into the matplotlib code earnestly, you can expect the code in any given module to use classes and functions across the entire matplotlib code base.

However, it would be nice to see more structure in the relationships. This is where our use of the previously mentioned modarch.matplotlib_groupings comes in. We have at our disposal a data structure that maps the matplotlib modules to the various layers of the matplotlib architecture. There is a convenient function in modarch that does this, and the ModGrapher class uses this function in several of its methods to group imports according to the matplotlib architecture that we defined.

Let's try the simplest method first, re-rendering the graph with a different mode:

In [21]: grapher.render(mode="reduced-structure")

The following figure is the result of the preceding command:

ModGrapher

The chaos is gone, but so are the interesting features. What we need is a combination of the two—something that shows the various modules that are imported as well as the overall usage of the architectural elements. All that is required is that you ensure that the imports of any one area of matplotlib's architecture that go outside the group terminate inside the group instead of crossing into the other groups (otherwise, we'd end up with the same graph that we started with).

This too has been coded in our module, and we just need to use the appropriate mode to render it:

In [22]: grapher.render(layout="neato", labels=True,
                        mode="simple-structure")

The following figure is the result of the preceding command:

ModGrapher

The code behind this graph does some additional simplification—it only goes two levels deep in the matplotlib namespace. For instance, matplotlib.a.b.c will be rolled up (with its weights contributing) into matplotlib.a. There is an additional mode, full-structure, which you can use to see all the imported matplotlib modules, as mapped to the architectural areas.

This brings us to the end of the our exploration of matplotlib's modules and module imports. Next, we will take a look at the architecture as reflected in the running code.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset