How it works...

Time series data is three-dimensional. Each sample is represented by its own file. Feature values in columns are measured on different time steps denoted by rows. For instance, in step 1, we saw the following snapshot, where time series data is displayed:

Each file represents a different sequence. When you open the file, you will see the observations (features) recorded on different time steps, as shown here:

The labels are contained in a single CSV file, which contains a value of 0, indicating death, or a value of 1, indicating survival. For example, for the features in 1.csv, the output labels are in 1.csv under the mortality directory. Note that we have a total of 4,000 samples. We divide the entire dataset into train/test sets so that our training data has 3,200 examples and the testing data has 800 examples.

In step 3, we used NumberedFileInputSplit to read and club all the files (features/labels) with a numbered format.

CSVSequenceRecordReader is to read sequences of data in CSV format, where each sequence is defined in its own file.

As you can see in the preceding screenshots, the first row is just meant for feature labels and needs to be bypassed. 

Hence, we have created the following CSV sequence reader:

SequenceRecordReader trainFeaturesReader = new CSVSequenceRecordReader(1, ",");
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset