Loading the data

Again, since we assume that we already have a new IBM Watson Studio project created, we can go ahead and add a new notebook to the project (from your project, click on Add to Project | Notebook, just as we did in prior chapters, just be sure to specify the language as Python). Let's take a look at the following steps:

  1. Load and open the file, then print the first five records (from the file). Recall that to accomplish this, there is no coding required.
  2. You simply click on Insert to code and then Insert pandas DataFrame for our file in the Files | Data Asset pane:

This automatically generates the following code in our notebook's first cell, which will load our data file into a pandas DataFrame object (df_data_1) and then print the first five records of the file:

The preceding code generates the following output for us:

From this review, we can see that each row of the dataset represents one lithofacies, and they are each represented by several features that are in our table's columns (as shown in the preceding screenshot).

Using the print and .shape functions of Python, we see that we have 180 lithofacies (the number of records in the file) and 8 features in the dataset:

  1. We can also use the .unique() function to demonstrate that we have eight different types of lithofacies in our dataset:

  1. Next, we can use the .size() function to see how each lithofacies is represented within the file. The data seems pretty balanced between 22 and 25, with the Mdst/Mdst-Wkst lithofacies being the most unbalanced with 16:

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset