How it works...

The CSV data from the dataset has 14 features. Each row represents a customer/record, as shown in the following screenshot:

Our dataset is a CSV file containing 10,000 customer records, where each record is labeled as to whether the customer left the business or not. Columns 0 to 13 represent input features. The 14th columnExited, indicates the label or prediction outcomeWe're dealing with a supervised model, and each prediction is labeled with 0 or 1, where 0 indicates a happy customer, and 1 indicates an unhappy customer who has left the business. The first row in the dataset is just feature labels, and we don't need them while processing the data. So, we have skipped the first line while we created the record reader instance in step 1. In step 1, 1 is the number of rows to be skipped on the dataset. Also, we have mentioned a comma delimiter (,because we are using a CSV file. In step 2, we used FileSplit to mention the customer churn dataset file. We can also deal with multiple dataset files using other InputSplit implementations, such as CollectionInputSplit, NumberedFileInputSplit, and so on. 

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.