To start, let me arouse your interest by showing you some analytical and graphical capabilities of Python. I have explained the code just briefly in this section; you will learn more about Python programming in the rest of this chapter. The following code imports the libraries required for this demonstration:
import numpy as np import pandas as pd import matplotlib.pyplot as plt
Then we need some data. I am using the same data from the AdventureWorksDW2014 demo database and the dbo.vTargetMail view as in the R chapters, Chapter 13, Supporting R in SQL Server, and Chapter 14, Data Exploration and Predictive Modeling with R, of this book. The following code reads this data from the CSV file:
TM = pd.read_csv("C:SQL2017DevGuideChapter15_TM.csv")
Now I can do a quick cross tabulation of the NumberCarsOwned variable using the TotalChildren variable, with the help of the following code:
obb = pd.crosstab(TM.NumberCarsOwned, TM.TotalChildren) obb
And here are the first results, a pivot table of the previously mentioned variables:
TotalChildren 0 1 2 3 4 5 NumberCarsOwned 0 990 1668 602 419 449 110 1 1747 1523 967 290 286 70 2 1752 162 1876 1047 1064 556 3 384 130 182 157 339 453 4 292 136 152 281 165 235
Now, let me show you the results of the pivot table in a graph. I need just the following two lines:
obb.plot(kind = 'bar') plt.show()
You can see the graph in the following figure:
It is quite simple to create even more complex graphs. The following code shows the distribution of the Age variable in histograms and with a kernel density plot:
(TM['Age'] - 20).hist(bins = 25, normed = True, color = 'lightblue') (TM['Age'] - 20).plot(kind='kde', style='r--', xlim = [0, 80]) plt.show()
You can see the results in the following figure. Note that in the code, I subtracted 20 from the actual age, to get a slightly younger population than exists in the demo database:
I hope that you are interested in learning Python after this brief introduction of its capabilities.