The scatter matrix

scatter matrix is another common analysis tool as it include several pairwise scatter plots of variables presented in a matrix format. It is also used to verify if variables are correlated and whether the correlation is positive or negative.

The following code can be used to experiment with this type of visualization:

from pandas.tools.plotting import scatter_matrix
from matplotlib import cm
feature_names = [ 'GCR', 'NPHI', 'PE', 'ILD', 'ILM']
X = df_data_1[feature_names]
y = df_data_1['lito_ID']
cmap = cm.get_cmap('gnuplot')
scatter = pd.plotting.scatter_matrix(X, c = y, marker = 'o', s=40, hist_kwds={'bins':15}, figsize=(9,9), cmap = cmap)
plt.suptitle('Scatter-matrix for each input variable')
plt.savefig('lithofacies_scatter_matrix')

This gives you the following output:

A scatter plot attempts to reveal relationships or associations between variables (called a correlation). Refer to the following link to learn more about scatter plots: 

https://mste.illinois.edu/courses/ci330ms/youtsey/scatterinfo.html

Looking at the scatter plot generated from our log data (shown in the preceding screenshot), I really don't see any specific or direct correlations between the data.

At this point, you may continue performing a deep dive into the data, perform some reforming or aggregations, or even perhaps go back to the original source (of the data) and request additional or new data.

In the interest of time, for this exercise, we will assume that we will use what data we have and move on to creating and testing various modeling algorithms.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset