This chapter marks a conceptual dividing line for the book. We've focused on topics such as matplotlib internals and APIs, plot interaction, high-level plotting, and the use of third-party libraries. We will continue in that vein in the first part of this chapter as we discuss advanced customization techniques for matplotlib. We will finish the chapter by discussing the elements of the advanced and lesser-known matplotlib configuration. The configuration theme will continue into the next chapter and then go beyond that into the realm of deployment. As such, this chapter will mark a transition to our exploration of matplotlib in the real world and its usage in computationally intensive tasks.
This chapter will provide an overview of the following, giving you enough confidence to tackle these in more depth at your own pace:
run
controlTo follow along with this chapter's code, clone the notebook's repository and start up IPython in the following way:
$ git clone https://github.com/masteringmatplotlib/custom-and-config.git $ cd custom-and-config $ make
On the journey through the lands of matplotlib, one of the signposts for intermediate territories is an increased need for fine-grained control over the libraries in the ecosystem. In our case, this means being able to tweak matplotlib for particular use cases such as specialty scales or projections, complex layouts, or a custom look and feel.
The first customization topic that we will cover is that of the new style support introduced in matplotlib 1.4. In the previous notebook, we saw how to get a list of the available styles:
In [2]: print(plt.style.available) ['bmh', 'ggplot', 'fivethirtyeight', 'dark_background', 'grayscale']
Now, we're going to see how we can create and use one of our own custom styles.
You can create custom styles and use them by calling style.use
with the path or URL to the style sheet. Alternatively, if you save the <style-name>.mplstyle
file to the ~/.matplotlib/stylelib
directory (you may need to create it), you can reuse your custom style sheet with a call to style.use(<style-name>)
. Note that a custom style sheet in ~/.matplotlib/stylelib
will override a style sheet defined by matplotlib if the styles have the same name.
There is a custom matplotlib style sheet included in this chapter's IPython Notebook git
repository, but before we go further, let's create a function that will generate a demo plot for us. We'll then render it by using the default style in the following way, thus having a baseline to compare our work to:
In [3]: def make_plot (): x = np.random.randn(5000, 6) (figure, axes) = plt.subplots(figsize=(16,10)) (n, bins, patches) = axes.hist( x, 12, normed=1, histtype='bar', label=['Color 1', 'Color 2', 'Color 3', 'Color 4', 'Color 5', 'Color 6']) axes.set_title( "Histogram for a Normal Distribution", fontsize=24) axes.set_xlabel("Data Points", fontsize=16) axes.set_ylabel("Counts", fontsize=16) axes.legend() plt.show() In [4]: make_plot()
The following is the sample plot obtained as result of the preceding code:
The preceding plot is the default style for matplotlib plots. Let's do something fun by copying the style of Thomas Park's Superhero Bootstrap theme. It's a darker theme with muted blues and desaturated accent colors. There is a screenshot of a demo website in the IPython Notebook for this chapter.
There are two styles provided, which differ only in the coloring of the text:
In [6]: ls -l ../styles total 16 -rw-r--r-- 1 u g 473 Feb 4 14:54 superheroine-1.mplstyle -rw-r--r-- 1 u g 473 Feb 4 14:53 superheroine-2.mplstyle
Let's take a look at the second one's contents, which show the hexadecimal colors that we copied from the Bootstrap theme:
In [7]: cat ../styles/superheroine-2.mplstyle lines.color: 4e5d6c patch.edgecolor: 4e5d6c text.color: df691b axes.facecolor: 2b3e50 axes.edgecolor: 4e5d6c axes.labelcolor: df691b axes.color_cycle: df691b, 5cb85c, 5bc0de, f0ad4e, d9534f, 4e5d6c axes.axisbelow: True xtick.color: 8c949d ytick.color: 8c949d grid.color: 4e5d6c figure.facecolor: 2b3e50 figure.edgecolor: 2b3e50 savefig.facecolor: 2b3e50 savefig.edgecolor: 2b3e50 legend.fancybox: True legend.shadow: True legend.frameon: True legend.framealpha: 0.6
The idea behind the matplotlib styles is wonderfully simple—don't reinvent anything, just offer an option for easy organization of data. If the preceding code looks familiar, it's because it is also available in the matplotlib run
control configuration file, matplotlibrc
, which will be discussed at the end of the chapter. Let's see how our custom style overrides the default color definitions:
In [8]: plt.style.use("../styles/superheroine-2.mplstyle") In [9]: make_plot()
The following is the plot obtained as result of the preceding code:
For a tiny bit of an effort, we have a significantly different visual impact. We'll continue using this style for the remainder of the chapter. In particular, we'll see what it looks like in the following section, when we assemble a collection of subplots.
In this section, we'll create a sophisticated subplot to give you a sense of matplotlib's plot layout capabilities. The system is flexible enough to accommodate everything from simple adjustments to the creation of dashboards in a single plot.
For this section, we have chosen to ingest data from the well-known UCI Machine Learning Repository. In particular, we'll use the 1985 Automobile Data Set. It serves as an example of data that can be used to assess the insurance risks for different vehicles. We will use it in an effort to compare 21 automobile manufacturers (using the 1985 data) along the following dimensions:
We will limit ourselves to automobile manufacturers that have data for losses, as well as six or more rows of data. Our subplot will comprise of the following sections:
These will be composed as subplots in the following manner:
-------------------------------------------- | overall title | -------------------------------------------- | price ranges | -------------------------------------------- | combined loss/risk | | | | radar | ---------------------- plots | | risk | loss | | -------------------------------------------- | mpg | --------------------------------------------
We've going to use a set of demonstration libraries that we included with this notebook to extract and manipulate the automobile maker data. Like we did before, we will take advantage of the power provided by the Pandas statistical analysis library. Let's load our modules by using the following code:
In [10]: import sys sys.path.append("../lib") import demodata, demoplot
As you can see in the IPython Notebook, there's more data there than what we need for the subplotting tasks. Let's created a limited set by using the following code:
In [11]: limited_data = demodata.get_limited_data() limited_data.head() Out[11]:
The following table is obtained as a result of the preceding command:
make |
price |
city mpg |
highway mpg |
horsepower |
weight |
riskiness |
losses | |
---|---|---|---|---|---|---|---|---|
0 |
audi |
13950 |
24 |
30 |
102 |
2337 |
2 |
164 |
1 |
audi |
17450 |
18 |
22 |
115 |
2824 |
2 |
164 |
2 |
audi |
17710 |
19 |
25 |
110 |
2844 |
1 |
158 |
3 |
audi |
23875 |
17 |
20 |
140 |
3086 |
1 |
158 |
4 |
bmw |
16430 |
23 |
29 |
101 |
2395 |
2 |
192 |
This has provided us with the full set of data minus the columns that we don't care about right now. However, we want to apply an additional constraint—we want to exclude auto manufacturers that have fewer than six rows in our dataset. We will do so with the help of the following command:
In [16]: data = demodata.get_limited_data(lower_bound=6)
We've got the data that we want, but we still have some preparations left to do. In particular, how are we going to compare data of different scales and relationships? Normalization seems like the obvious answer, but we want to make sure that the normalized values compare appropriately. High losses and a high riskiness factor are less favorable, while a higher number of miles per gallon is more favorable. All this is taken care of by the following code:
In [19]: normed_data = data.copy() normed_data.rename( columns={"horsepower": "power"}, inplace=True) In [20]: demodata.norm_columns( ["city mpg", "highway mpg", "power"], normed_data) In [21]: demodata.invert_norm_columns( ["price", "weight", "riskiness", "losses"], normed_data)
What we did in the preceding code was make a copy of the limited data that we've established as our starting point, and then we updated the copied set by calling two functions—the first function normalized the given columns whose values are more favorable when higher, and the other function inverted the normalized values to match the first normalization (as their pre-inverted values are more favorable when lower). We now have a normalized dataset in which all the values are more favorable when higher.
If you would like to have more exposure to Pandas in action, be sure to view the functions in the demodata
module. There are several useful tricks that are employed there to manipulate data.
Before jumping into subplots, let's take a look at a few individual plots for our dataset that will be included as subplots. The first one that we will generate is for the automobile price ranges:
In [22]: figure = plt.figure(figsize=(15, 5)) prices_gs = mpl.gridspec.GridSpec(1, 1) prices_axes = demoplot.make_autos_price_plot( figure, prices_gs, data) plt.show()
Note that we didn't use the usual approach that we had taken, in which we get the figure and axes objects from a call to plt.subplots
. Instead, we opted to use the GridSpec
class to generate our axes (in the make_autos_price_plot
function). We've done this because later, we wish to use GridSpec
to create our subplots.
Here is the output that is generated from the call to plt.show()
:
Keep in mind that the preceding plot is a bit contrived (there's no inherent meaning in connecting manufacturer maximum, mean, and minimum values). Its sole purpose is to simply provide some eye candy for the subplot that we will be creating. As you can see from the instantiation of GridSpec
, this plot has one set of axes that takes up the entire plot. Most of our individual plots will have the same geometry. The one exception to this is the radar plot that we will be creating.
Radar plots are useful when you wish to compare normalized data to multiple variables and populations. Radar plots are capable of providing visual cues that reveal insights instantly. For example, consider the following figure:
The preceding figure shows the data that was consolidated from several 1985 Volvo models across the dimensions of price, inverse losses to insurers, inverse riskiness, weight, horsepower, and the highway and city miles per gallon. Since the data has been normalized for the highest values as the most positive, the best scenario would be for a manufacturer to have colored polygons at the limits of the axes. The conclusions that we can draw from this is this—relative to the other manufacturers in the dataset, the 1985 Volvos are heavy, expensive, and have a pretty good horsepower. However, where they really shine is in the safety for insurance companies—low losses and a very low risk (again, the values that are larger are better). Even Volvo's minimum values are high in these categories. That's one manufacturer. Let's look at the whole group:
In [27]: figure = plt.figure(figsize=(15, 5)) radar_gs = mpl.gridspec.GridSpec( 3, 7, height_ratios=[1, 10, 10], wspace=0.5 0, hspace=0.60, top=0.95, bottom=0.25) radar_axes = demoplot.make_autos_radar_plot( figure, radar_gs, normed_data) plt.show()
The following table is obtained as a result of the preceding code:
There are interesting conclusions to the graph from this view of the data, but we will focus on the code that generated it. In particular, note the geometry of the grid—three by seven. What does this mean and how are we going to use it? We have two rows of six manufacturers. However, we added an extra row for an empty (and hidden) axis. This is used at the top for the overall title. We then added an extra column for the legend, which spans two rows. This brings us from a grid of two by six to a grid of three by seven. The remaining 12 axes in the grid are populated with a highly customized polar plot, giving us the radar plots for each of the manufacturers.
This example was included not only because it's visually compelling, but also because it will show how flexible the grid specification system for matplotlib is when we put them together. We have the ability to place plots within plots.
We've seen a small aspect of the GridSpec
usage. This has been a tiny warm-up exercise compared to what's coming! Let's refresh with the ASCII sketch of the subplots that we wanted to create. Flip back to that page and look at the layout. We have three axes that will be stretching all the way across the title, price ranges, and the MPG data at the bottom. The three riskiness or losses plots will then be placed on the left-hand side in the middle of the page, and the radar plots will take the other half of that part of the plot on the right-hand side.
We can plot what this will look like before adding any of the data, just by creating the grid and subplot specification objects. The following may look a bit hairy, but keep in mind that when splicing the subplot specs, you're using the same technique that was used when splicing the NumPy array data:
In [28]: figure = plt.figure(figsize=(10, 8)) gs_master = mpl.gridspec.GridSpec( 4, 2, height_ratios=[1, 2, 8, 2]) # Layer 1 - Title gs_1 = mpl.gridspec.GridSpecFromSubplotSpec( 1, 1, subplot_spec=gs_master[0, :]) title_axes = figure.add_subplot(gs_1[0]) # Layer 2 - Price gs_2 = mpl.gridspec.GridSpecFromSubplotSpec( 1, 1, subplot_spec=gs_master[1, :]) price_axes = figure.add_subplot(gs_2[0]) # Layer 3 - Risks & Radar gs_31 = mpl.gridspec.GridSpecFromSubplotSpec( 2, 2, height_ratios=[2, 1], subplot_spec=gs_master[2, :1]) risk_and_loss_axes = figure.add_subplot(gs_31[0, :]) risk_axes = figure.add_subplot(gs_31[1, :1]) loss_axes = figure.add_subplot(gs_31[1:, 1]) gs_32 = mpl.gridspec.GridSpecFromSubplotSpec( 1, 1, subplot_spec=gs_master[2, 1]) radar_axes = figure.add_subplot(gs_32[0]) # Layer 4 - MPG gs_4 = mpl.gridspec.GridSpecFromSubplotSpec( 1, 1, subplot_spec=gs_master[3, :]) mpg_axes = figure.add_subplot(gs_4[0]) # Tidy up gs_master.tight_layout(figure) plt.show()
In the preceding code, when we instantiated GridSpec
, we provided a geometry of four rows and two columns. We then passed the data for the height ratios so that each row will have an appropriate size that is relative to the others. In the section at the middle, for the risk
and radar
plots, we gave a geometry of two rows and two columns, and again passed the height ratios that provide the proportions we desire. This code results in the following plot:
That's exactly what we were aiming for. Now, we're ready to start adding individual plots. The code that generated the preceding skeleton plot differs from the final result in the following three key ways:
Here is the code that inserts all the individual plots into their own subplots:
In [29]: figure = plt.figure(figsize=(15, 15)) gs_master = mpl.gridspec.GridSpec( 4, 2, height_ratios=[1, 24, 128, 32], hspace=0, wspace=0) # Layer 1 - Title gs_1 = mpl.gridspec.GridSpecFromSubplotSpec( 1, 1, subplot_spec=gs_master[0, :]) title_axes = figure.add_subplot(gs_1[0]) title_axes.set_title( "Demo Plots for 1985 Auto Maker Data", fontsize=30, color="#cdced1") demoplot.hide_axes(title_axes) # Layer 2 - Price gs_2 = mpl.gridspec.GridSpecFromSubplotSpec( 1, 1, subplot_spec=gs_master[1, :]) price_axes = figure.add_subplot(gs_2[0]) demoplot.make_autos_price_plot( figure, pddata=data, axes=price_axes) # Layer 3, Part I - Risks gs_31 = mpl.gridspec.GridSpecFromSubplotSpec( 2, 2, height_ratios=[2, 1], hspace=0.4, subplot_spec=gs_master[2, :1]) risk_and_loss_axes = figure.add_subplot(gs_31[0, :]) demoplot.make_autos_loss_and_risk_plot( figure, pddata=normed_data, axes=risk_and_loss_axes, x_label=False, rotate_ticks=True) risk_axes = figure.add_subplot(gs_31[1, :1]) demoplot.make_autos_riskiness_plot( figure, pddata=normed_data, axes=risk_axes, legend=False, labels=False) loss_axes = figure.add_subplot(gs_31[1:, 1]) demoplot.make_autos_losses_plot( figure, pddata=normed_data, axes=loss_axes, legend=False, labels=False) # Layer 3, Part II - Radar gs_32 = mpl.gridspec.GridSpecFromSubplotSpec( 5, 3, height_ratios=[1, 20, 20, 20, 20], hspace=0.6, wspace=0, subplot_spec=gs_master[2, 1]) (rows, cols) = geometry = gs_32.get_geometry() title_axes = figure.add_subplot(gs_32[0, :]) inner_axes = [] projection = radar.RadarAxes(spoke_count=len( normed_data.groupby("make").mean().columns)) [inner_axes.append(figure.add_subplot( m, projection=projection)) for m in [n for n in gs_32][cols:]] demoplot.make_autos_radar_plot( figure, pddata=normed_data, title_axes=title_axes, inner_axes=inner_axes, legend_axes=False, geometry=geometry) # Layer 4 - MPG gs_4 = mpl.gridspec.GridSpecFromSubplotSpec( 1, 1, subplot_spec=gs_master[3, :]) mpg_axes = figure.add_subplot(gs_4[0]) demoplot.make_autos_mpg_plot( figure, pddata=data, axes=mpg_axes) # Tidy up gs_master.tight_layout(figure) plt.show()
Though there is a lot of code here, keep in mind that it's essentially the same as the skeleton of subplots that we created. For most of the plots, all we had to do was make a call to the function that creates the desired plot, passing the axes that we created by splicing a part of the spec and adding a subplot for that splice to the figure. The one that wasn't so straightforward was the radar plot collection. This is due to the fact that we not only needed to define the projection for each radar plot, but also needed to create the 12 axes needed for each manufacturer. Despite this complication, the use of GridSpec
and GridSpecFromSubplotSpec
clearly demonstrates the ease with which complicated visual data can be assembled to provide all the power and convenience of a typical dashboard view.
The following plot is the result of the preceding code:
The creation of complex subplots in matplotlib can be perceived as a daunting task. However, the following basic practices can help you make it a painless process of creating visual goodness:
GridSpec
- and GridSpecFromSubplotSpec
-based collection of subplots with empty axes. Don't add any plot data. Your grid-tweaking should happen at this point.We have covered two areas of customization that come up frequently in various online forums. The other topics in advanced matplotlib customization include the creation of axes, scales, projections, and backends for some particular data or project requirements. Each of these have tutorials or examples that are provided by the matplotlib project, and given your newly attained comfort level with reading the matplotlib sources directly, these are now within your reach.
Several of these are worth mentioning specifically:
custom_projection_example.py
provides a highly detailed look into the means by which you can create custom projections. Another example of this is the radar plot that we created earlier in this chapter. If you view the library files for this chapter, you will see that we based the work on the polar projection that comes with matplotlib.custom_scale_example.py
shows how to create a new scale for the y axis, which uses the same system as that of the Mercator map projection. This is a smaller amount of code, which is more easily digestible than the preceding projection example.Finally, Joe Kington, a geophysicist, created an open source project for equal-angle Stereonets in matplotlib. Stereonets, or Wulff net are used in geological studies and research, and Dr. Kington's code provides excellent examples of custom transforms and projections. All of this has been documented very well. This is an excellent project to examine in detail after working on the matplotlib.org tutorials and examples on creating custom projections, scales, and transformations.