Chapter 6. Customization and Configuration

This chapter marks a conceptual dividing line for the book. We've focused on topics such as matplotlib internals and APIs, plot interaction, high-level plotting, and the use of third-party libraries. We will continue in that vein in the first part of this chapter as we discuss advanced customization techniques for matplotlib. We will finish the chapter by discussing the elements of the advanced and lesser-known matplotlib configuration. The configuration theme will continue into the next chapter and then go beyond that into the realm of deployment. As such, this chapter will mark a transition to our exploration of matplotlib in the real world and its usage in computationally intensive tasks.

This chapter will provide an overview of the following, giving you enough confidence to tackle these in more depth at your own pace:

  • Customization
    • matplotlib styles
    • Subplots
    • Further exploration
  • Configuration
    • The matplotlib run control
    • Options in IPython

To follow along with this chapter's code, clone the notebook's repository and start up IPython in the following way:

$ git clone https://github.com/masteringmatplotlib/custom-and-config.git
$ cd custom-and-config
$ make

Customization

On the journey through the lands of matplotlib, one of the signposts for intermediate territories is an increased need for fine-grained control over the libraries in the ecosystem. In our case, this means being able to tweak matplotlib for particular use cases such as specialty scales or projections, complex layouts, or a custom look and feel.

Creating a custom style

The first customization topic that we will cover is that of the new style support introduced in matplotlib 1.4. In the previous notebook, we saw how to get a list of the available styles:

In [2]: print(plt.style.available)
        ['bmh', 'ggplot', 'fivethirtyeight', 'dark_background',
        'grayscale']

Now, we're going to see how we can create and use one of our own custom styles.

You can create custom styles and use them by calling style.use with the path or URL to the style sheet. Alternatively, if you save the <style-name>.mplstyle file to the ~/.matplotlib/stylelib directory (you may need to create it), you can reuse your custom style sheet with a call to style.use(<style-name>). Note that a custom style sheet in ~/.matplotlib/stylelib will override a style sheet defined by matplotlib if the styles have the same name.

There is a custom matplotlib style sheet included in this chapter's IPython Notebook git repository, but before we go further, let's create a function that will generate a demo plot for us. We'll then render it by using the default style in the following way, thus having a baseline to compare our work to:

In [3]: def make_plot ():
            x = np.random.randn(5000, 6)
            (figure, axes) = plt.subplots(figsize=(16,10))
            (n, bins, patches) = axes.hist(
                x, 12, normed=1, histtype='bar',
                label=['Color 1', 'Color 2', 'Color 3',
                       'Color 4', 'Color 5', 'Color 6'])
            axes.set_title(
                "Histogram
for a
Normal Distribution", fontsize=24)
            axes.set_xlabel("Data Points", fontsize=16)
            axes.set_ylabel("Counts", fontsize=16)
            axes.legend()
            plt.show()
In [4]: make_plot()

The following is the sample plot obtained as result of the preceding code:

Creating a custom style

The preceding plot is the default style for matplotlib plots. Let's do something fun by copying the style of Thomas Park's Superhero Bootstrap theme. It's a darker theme with muted blues and desaturated accent colors. There is a screenshot of a demo website in the IPython Notebook for this chapter.

There are two styles provided, which differ only in the coloring of the text:

In [6]: ls -l ../styles
total 16
-rw-r--r--  1 u  g  473 Feb  4 14:54 superheroine-1.mplstyle
-rw-r--r--  1 u  g  473 Feb  4 14:53 superheroine-2.mplstyle

Let's take a look at the second one's contents, which show the hexadecimal colors that we copied from the Bootstrap theme:

In [7]: cat ../styles/superheroine-2.mplstyle
lines.color: 4e5d6c
patch.edgecolor: 4e5d6c

text.color: df691b

axes.facecolor: 2b3e50
axes.edgecolor: 4e5d6c
axes.labelcolor: df691b
axes.color_cycle: df691b, 5cb85c, 5bc0de, f0ad4e, d9534f, 4e5d6c
axes.axisbelow: True

xtick.color: 8c949d
ytick.color: 8c949d

grid.color: 4e5d6c

figure.facecolor: 2b3e50
figure.edgecolor: 2b3e50

savefig.facecolor: 2b3e50
savefig.edgecolor: 2b3e50

legend.fancybox: True
legend.shadow: True
legend.frameon: True
legend.framealpha: 0.6

The idea behind the matplotlib styles is wonderfully simple—don't reinvent anything, just offer an option for easy organization of data. If the preceding code looks familiar, it's because it is also available in the matplotlib run control configuration file, matplotlibrc, which will be discussed at the end of the chapter. Let's see how our custom style overrides the default color definitions:

In [8]: plt.style.use("../styles/superheroine-2.mplstyle")
In [9]: make_plot()

The following is the plot obtained as result of the preceding code:

Creating a custom style

For a tiny bit of an effort, we have a significantly different visual impact. We'll continue using this style for the remainder of the chapter. In particular, we'll see what it looks like in the following section, when we assemble a collection of subplots.

Subplots

In this section, we'll create a sophisticated subplot to give you a sense of matplotlib's plot layout capabilities. The system is flexible enough to accommodate everything from simple adjustments to the creation of dashboards in a single plot.

For this section, we have chosen to ingest data from the well-known UCI Machine Learning Repository. In particular, we'll use the 1985 Automobile Data Set. It serves as an example of data that can be used to assess the insurance risks for different vehicles. We will use it in an effort to compare 21 automobile manufacturers (using the 1985 data) along the following dimensions:

  • Mean price
  • Mean city MPG
  • Mean highway MPG
  • Mean horsepower
  • Mean curb weight
  • Mean relative average loss payment
  • Mean insurance riskiness

We will limit ourselves to automobile manufacturers that have data for losses, as well as six or more rows of data. Our subplot will comprise of the following sections:

  • An overall title
  • Line plots for maximum, mean, and minimum prices
  • A stacked bar chart for combined riskiness or losses
  • A stacked bar chart for riskiness
  • A stacked bar chart for losses
  • Radar charts for each automobile manufacturer
  • A combined scatterplot for the city and highway MPG

These will be composed as subplots in the following manner:

--------------------------------------------
|               overall title              |
--------------------------------------------
|               price ranges               |
--------------------------------------------
| combined loss/risk |                     |
|                    |        radar        |
----------------------        plots        |
|  risk   |   loss   |                     |
--------------------------------------------
|                   mpg                    |
--------------------------------------------

Revisiting Pandas

We've going to use a set of demonstration libraries that we included with this notebook to extract and manipulate the automobile maker data. Like we did before, we will take advantage of the power provided by the Pandas statistical analysis library. Let's load our modules by using the following code:

In [10]: import sys
         sys.path.append("../lib")
         import demodata, demoplot

As you can see in the IPython Notebook, there's more data there than what we need for the subplotting tasks. Let's created a limited set by using the following code:

In [11]: limited_data = demodata.get_limited_data()
         limited_data.head()
Out[11]:

The following table is obtained as a result of the preceding command:

 

make

price

city mpg

highway mpg

horsepower

weight

riskiness

losses

0

audi

13950

24

30

102

2337

2

164

1

audi

17450

18

22

115

2824

2

164

2

audi

17710

19

25

110

2844

1

158

3

audi

23875

17

20

140

3086

1

158

4

bmw

16430

23

29

101

2395

2

192

This has provided us with the full set of data minus the columns that we don't care about right now. However, we want to apply an additional constraint—we want to exclude auto manufacturers that have fewer than six rows in our dataset. We will do so with the help of the following command:

In [16]: data = demodata.get_limited_data(lower_bound=6)

We've got the data that we want, but we still have some preparations left to do. In particular, how are we going to compare data of different scales and relationships? Normalization seems like the obvious answer, but we want to make sure that the normalized values compare appropriately. High losses and a high riskiness factor are less favorable, while a higher number of miles per gallon is more favorable. All this is taken care of by the following code:

In [19]: normed_data = data.copy()
         normed_data.rename(
             columns={"horsepower": "power"}, inplace=True)
In [20]: demodata.norm_columns(
             ["city mpg", "highway mpg", "power"], normed_data)
In [21]: demodata.invert_norm_columns(
             ["price", "weight", "riskiness", "losses"],
             normed_data)

What we did in the preceding code was make a copy of the limited data that we've established as our starting point, and then we updated the copied set by calling two functions—the first function normalized the given columns whose values are more favorable when higher, and the other function inverted the normalized values to match the first normalization (as their pre-inverted values are more favorable when lower). We now have a normalized dataset in which all the values are more favorable when higher.

If you would like to have more exposure to Pandas in action, be sure to view the functions in the demodata module. There are several useful tricks that are employed there to manipulate data.

Individual plots

Before jumping into subplots, let's take a look at a few individual plots for our dataset that will be included as subplots. The first one that we will generate is for the automobile price ranges:

In [22]: figure = plt.figure(figsize=(15, 5))
         prices_gs = mpl.gridspec.GridSpec(1, 1)
         prices_axes = demoplot.make_autos_price_plot(
             figure, prices_gs, data)
         plt.show()

Note that we didn't use the usual approach that we had taken, in which we get the figure and axes objects from a call to plt.subplots. Instead, we opted to use the GridSpec class to generate our axes (in the make_autos_price_plot function). We've done this because later, we wish to use GridSpec to create our subplots.

Here is the output that is generated from the call to plt.show():

Individual plots

Keep in mind that the preceding plot is a bit contrived (there's no inherent meaning in connecting manufacturer maximum, mean, and minimum values). Its sole purpose is to simply provide some eye candy for the subplot that we will be creating. As you can see from the instantiation of GridSpec, this plot has one set of axes that takes up the entire plot. Most of our individual plots will have the same geometry. The one exception to this is the radar plot that we will be creating.

Radar plots are useful when you wish to compare normalized data to multiple variables and populations. Radar plots are capable of providing visual cues that reveal insights instantly. For example, consider the following figure:

Individual plots

The preceding figure shows the data that was consolidated from several 1985 Volvo models across the dimensions of price, inverse losses to insurers, inverse riskiness, weight, horsepower, and the highway and city miles per gallon. Since the data has been normalized for the highest values as the most positive, the best scenario would be for a manufacturer to have colored polygons at the limits of the axes. The conclusions that we can draw from this is this—relative to the other manufacturers in the dataset, the 1985 Volvos are heavy, expensive, and have a pretty good horsepower. However, where they really shine is in the safety for insurance companies—low losses and a very low risk (again, the values that are larger are better). Even Volvo's minimum values are high in these categories. That's one manufacturer. Let's look at the whole group:

In [27]: figure = plt.figure(figsize=(15, 5))
         radar_gs = mpl.gridspec.GridSpec(
             3, 7, height_ratios=[1, 10, 10], wspace=0.5
0,
             hspace=0.60, top=0.95, bottom=0.25)
         radar_axes = demoplot.make_autos_radar_plot(
             figure, radar_gs, normed_data)
         plt.show()

The following table is obtained as a result of the preceding code:

Individual plots

There are interesting conclusions to the graph from this view of the data, but we will focus on the code that generated it. In particular, note the geometry of the grid—three by seven. What does this mean and how are we going to use it? We have two rows of six manufacturers. However, we added an extra row for an empty (and hidden) axis. This is used at the top for the overall title. We then added an extra column for the legend, which spans two rows. This brings us from a grid of two by six to a grid of three by seven. The remaining 12 axes in the grid are populated with a highly customized polar plot, giving us the radar plots for each of the manufacturers.

This example was included not only because it's visually compelling, but also because it will show how flexible the grid specification system for matplotlib is when we put them together. We have the ability to place plots within plots.

Bringing everything together

We've seen a small aspect of the GridSpec usage. This has been a tiny warm-up exercise compared to what's coming! Let's refresh with the ASCII sketch of the subplots that we wanted to create. Flip back to that page and look at the layout. We have three axes that will be stretching all the way across the title, price ranges, and the MPG data at the bottom. The three riskiness or losses plots will then be placed on the left-hand side in the middle of the page, and the radar plots will take the other half of that part of the plot on the right-hand side.

We can plot what this will look like before adding any of the data, just by creating the grid and subplot specification objects. The following may look a bit hairy, but keep in mind that when splicing the subplot specs, you're using the same technique that was used when splicing the NumPy array data:

In [28]: figure = plt.figure(figsize=(10, 8))
         gs_master = mpl.gridspec.GridSpec(
             4, 2, height_ratios=[1, 2, 8, 2])
         # Layer 1 - Title
         gs_1 = mpl.gridspec.GridSpecFromSubplotSpec(
             1, 1, subplot_spec=gs_master[0, :])
         title_axes = figure.add_subplot(gs_1[0])
         # Layer 2 - Price
         gs_2 = mpl.gridspec.GridSpecFromSubplotSpec(
             1, 1, subplot_spec=gs_master[1, :])
         price_axes = figure.add_subplot(gs_2[0])
         # Layer 3 - Risks & Radar
         gs_31 = mpl.gridspec.GridSpecFromSubplotSpec(
             2, 2, height_ratios=[2, 1],
             subplot_spec=gs_master[2, :1])
         risk_and_loss_axes = figure.add_subplot(gs_31[0, :])
         risk_axes = figure.add_subplot(gs_31[1, :1])
         loss_axes = figure.add_subplot(gs_31[1:, 1])
         gs_32 = mpl.gridspec.GridSpecFromSubplotSpec(
             1, 1, subplot_spec=gs_master[2, 1])
         radar_axes = figure.add_subplot(gs_32[0])
         # Layer 4 - MPG
         gs_4 = mpl.gridspec.GridSpecFromSubplotSpec(
             1, 1, subplot_spec=gs_master[3, :])
         mpg_axes = figure.add_subplot(gs_4[0])
         # Tidy up
         gs_master.tight_layout(figure)
         plt.show()

In the preceding code, when we instantiated GridSpec, we provided a geometry of four rows and two columns. We then passed the data for the height ratios so that each row will have an appropriate size that is relative to the others. In the section at the middle, for the risk and radar plots, we gave a geometry of two rows and two columns, and again passed the height ratios that provide the proportions we desire. This code results in the following plot:

Bringing everything together

That's exactly what we were aiming for. Now, we're ready to start adding individual plots. The code that generated the preceding skeleton plot differs from the final result in the following three key ways:

  • The axes that are created will now get passed to the plot functions
  • The plot functions will update the axes with their results (and thus no longer be empty)
  • The skeleton radar plot had a one-by-one geometry; the real version will instead have a five-by-three geometry in the same area

Here is the code that inserts all the individual plots into their own subplots:

In [29]: figure = plt.figure(figsize=(15, 15))
         gs_master = mpl.gridspec.GridSpec(
             4, 2, height_ratios=[1, 24, 128, 32], hspace=0,
             wspace=0)

         # Layer 1 - Title
         gs_1 = mpl.gridspec.GridSpecFromSubplotSpec(
             1, 1, subplot_spec=gs_master[0, :])
         title_axes = figure.add_subplot(gs_1[0])
         title_axes.set_title(
             "Demo Plots for 1985 Auto Maker Data",
             fontsize=30, color="#cdced1")
         demoplot.hide_axes(title_axes)

         # Layer 2 - Price
         gs_2 = mpl.gridspec.GridSpecFromSubplotSpec(
             1, 1, subplot_spec=gs_master[1, :])
         price_axes = figure.add_subplot(gs_2[0])
         demoplot.make_autos_price_plot(
             figure, pddata=data, axes=price_axes)

         # Layer 3, Part I - Risks
         gs_31 = mpl.gridspec.GridSpecFromSubplotSpec(
             2, 2, height_ratios=[2, 1], hspace=0.4,
             subplot_spec=gs_master[2, :1])
         risk_and_loss_axes = figure.add_subplot(gs_31[0, :])
         demoplot.make_autos_loss_and_risk_plot(
            figure, pddata=normed_data,
            axes=risk_and_loss_axes, x_label=False,
            rotate_ticks=True)
         risk_axes = figure.add_subplot(gs_31[1, :1])
         demoplot.make_autos_riskiness_plot(
             figure, pddata=normed_data, axes=risk_axes,
             legend=False, labels=False)
         loss_axes = figure.add_subplot(gs_31[1:, 1])
         demoplot.make_autos_losses_plot(
             figure, pddata=normed_data, axes=loss_axes,
             legend=False, labels=False)

         # Layer 3, Part II - Radar
         gs_32 = mpl.gridspec.GridSpecFromSubplotSpec(
            5, 3, height_ratios=[1, 20, 20, 20, 20],
            hspace=0.6, wspace=0,
            subplot_spec=gs_master[2, 1])
         (rows, cols) = geometry = gs_32.get_geometry()
         title_axes = figure.add_subplot(gs_32[0, :])
         inner_axes = []
         projection = radar.RadarAxes(spoke_count=len(
             normed_data.groupby("make").mean().columns))
         [inner_axes.append(figure.add_subplot(
             m, projection=projection))
             for m in [n for n in gs_32][cols:]]
         demoplot.make_autos_radar_plot(
             figure, pddata=normed_data,
             title_axes=title_axes, inner_axes=inner_axes,
             legend_axes=False, geometry=geometry)

         # Layer 4 - MPG
         gs_4 = mpl.gridspec.GridSpecFromSubplotSpec(
             1, 1, subplot_spec=gs_master[3, :])
         mpg_axes = figure.add_subplot(gs_4[0])
         demoplot.make_autos_mpg_plot(
             figure, pddata=data, axes=mpg_axes)

         # Tidy up
         gs_master.tight_layout(figure)
         plt.show()

Though there is a lot of code here, keep in mind that it's essentially the same as the skeleton of subplots that we created. For most of the plots, all we had to do was make a call to the function that creates the desired plot, passing the axes that we created by splicing a part of the spec and adding a subplot for that splice to the figure. The one that wasn't so straightforward was the radar plot collection. This is due to the fact that we not only needed to define the projection for each radar plot, but also needed to create the 12 axes needed for each manufacturer. Despite this complication, the use of GridSpec and GridSpecFromSubplotSpec clearly demonstrates the ease with which complicated visual data can be assembled to provide all the power and convenience of a typical dashboard view.

The following plot is the result of the preceding code:

Bringing everything together

The creation of complex subplots in matplotlib can be perceived as a daunting task. However, the following basic practices can help you make it a painless process of creating visual goodness:

  1. Write down an explicit plan for what you want to present, which data you want to combine, where you will use the stacked data and means, and so on.
  2. Sketch out on paper or in an ASCII diagram the desired layout. This will often reveal something that you hadn't considered.
  3. With the layout decided upon, create a GridSpec- and GridSpecFromSubplotSpec-based collection of subplots with empty axes. Don't add any plot data. Your grid-tweaking should happen at this point.
  4. With your girds ironed out, update your axes with the desired plots.

Further explorations in customization

We have covered two areas of customization that come up frequently in various online forums. The other topics in advanced matplotlib customization include the creation of axes, scales, projections, and backends for some particular data or project requirements. Each of these have tutorials or examples that are provided by the matplotlib project, and given your newly attained comfort level with reading the matplotlib sources directly, these are now within your reach.

Several of these are worth mentioning specifically:

  • The API example code for custom_projection_example.py provides a highly detailed look into the means by which you can create custom projections. Another example of this is the radar plot that we created earlier in this chapter. If you view the library files for this chapter, you will see that we based the work on the polar projection that comes with matplotlib.
  • The API example code for custom_scale_example.py shows how to create a new scale for the y axis, which uses the same system as that of the Mercator map projection. This is a smaller amount of code, which is more easily digestible than the preceding projection example.
  • The matplotlib Transformations Tutorial will teach you how to create data transforms between coordinate systems, use axes transforms to keep the text bubbles in fixed positions while zooming, and blend transformations for the highlighting portions of the plotted data.

Finally, Joe Kington, a geophysicist, created an open source project for equal-angle Stereonets in matplotlib. Stereonets, or Wulff net are used in geological studies and research, and Dr. Kington's code provides excellent examples of custom transforms and projections. All of this has been documented very well. This is an excellent project to examine in detail after working on the matplotlib.org tutorials and examples on creating custom projections, scales, and transformations.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset