How it works...

The CSV data from the dataset has 14 features. Each row represents a customer/record, as shown in the following screenshot:

Our dataset is a CSV file containing 10,000 customer records, where each record is labeled as to whether the customer left the business or not. Columns 0 to 13 represent input features. The 14^th column, Exited, indicates the label or prediction outcome. We're dealing with a supervised model, and each prediction is labeled with 0 or 1, where 0 indicates a happy customer, and 1 indicates an unhappy customer who has left the business. The first row in the dataset is just feature labels, and we don't need them while processing the data. So, we have skipped the first line while we created the record reader instance in step 1. In step 1, 1 is the number of rows to be skipped on the dataset. Also, we have mentioned a comma delimiter (,) because we are using a CSV file. In step 2, we used FileSplit to mention the customer churn dataset file. We can also deal with multiple dataset files using other InputSplit implementations, such as CollectionInputSplit, NumberedFileInputSplit, and so on.

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...