Discovering patterns by parallel coordinates

The scatterplot matrix can inform you about the conjoint distributions of your features. It helps you locate groups in data and verify whether they are distinguishable. Parallel coordinates are another kind of plot that is helpful in providing you with a hint about the most group-discriminating variables present in your data.

By plotting all the observations as parallel lines with respect to all the possible variables (arbitrarily aligned on the abscissa), parallel coordinates will help you spot whether there are streams of observations grouped as your classes, and understand the variables that best separate the streams (the most useful predictor variables). Naturally, in order for the chart to be meaningful, the features in the plot should have the same scale (otherwise, normalize them) as in the Iris dataset:

In: from pandas.tools.plotting import parallel_coordinates
pll = parallel_coordinates(iris_df,'groups')

The previous code will output the parallel coordinates:

parallel_coordinates is a pandas function that, in order to work properly, just needs as parameters the data DataFrame and the string name of the variable containing the groups whose separability you want to test. For this reason, you should have the group variable available in your dataset. However, don't forget to remove it after you finish exploring by using the DataFrame.drop('variable name', axis=1, inplace=True) method.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset