Discovering patterns by parallel coordinates

The scatterplot matrix can inform you about the conjoint distributions of your features. It helps you locate groups in data and verify whether they are distinguishable. Parallel coordinates are another kind of plot that is helpful in providing you with a hint about the most group-discriminating variables present in your data.

By plotting all the observations as parallel lines with respect to all the possible variables (arbitrarily aligned on the abscissa), parallel coordinates will help you spot whether there are streams of observations grouped as your classes, and understand the variables that best separate the streams (the most useful predictor variables). Naturally, in order for the chart to be meaningful, the features in the plot should have the same scale (otherwise, normalize them) as in the Iris dataset:

In: from pandas.tools.plotting import parallel_coordinates
    pll = parallel_coordinates(iris_df,'groups')

The previous code will output the parallel coordinates:

parallel_coordinates is a pandas function that, in order to work properly, just needs as parameters the data DataFrame and the string name of the variable containing the groups whose separability you want to test. For this reason, you should have the group variable available in your dataset. However, don't forget to remove it after you finish exploring by using the DataFrame.drop('variable name', axis=1, inplace=True) method.

Table of Contents for Discovering patterns by parallel coordinates

Create new playlist

Sign In

Sign Up

Table of Contents for
Discovering patterns by parallel coordinates