How it works...

We have used NumberedFileInputSplit in step 1. It is necessary to use NumberedFileInputSplit to load data from multiple files that follow a numbered file naming convention. Refer to step 1 in this recipe:

SequenceRecordReader trainFeaturesSequenceReader = new CSVSequenceRecordReader();
trainFeaturesSequenceReader.initialize(new NumberedFileInputSplit(new File(trainfeatureDir).getAbsolutePath()+"/%d.csv",0,449));

We stored files as a sequence of numbered files in the previous recipe. There are 450 files, and each one of them represents a sequence. Note that we have stored 150 files for testing as demonstrated in step 3.

In step 5, numOfClasses specifies the number of categories against which the neural network is trying to make a prediction. In our example, it is 6. We mentioned AlignmentMode.ALIGN_END while creating the iterator. The alignment mode deals with input/labels of varying lengths. For example, our time series data has 60 time steps, and there's only one label at the end of the 60th time step. That's the reason why we use AlignmentMode.ALIGN_END in the iterator definition, as follows: 

DataSetIterator trainIterator = new SequenceRecordReaderDataSetIterator(trainFeaturesSequenceReader,trainLabelsSequenceReader,batchSize,numOfClasses,false, SequenceRecordReaderDataSetIterator.AlignmentMode.ALIGN_END);

We can also have time series data that produces labels at every time step. These cases refer to many-to-many input/label connections. 

In step 4, we started with the regular way of creating iterators, as follows:

DataSetIterator trainIterator = new SequenceRecordReaderDataSetIterator(trainFeaturesSequenceReader,trainLabelsSequenceReader,batchSize,numOfClasses);

DataSetIterator testIterator = new SequenceRecordReaderDataSetIterator(testFeaturesSequenceReader,testLabelsSequenceReader,batchSize,numOfClasses);

Note that this is not the only way to create sequence reader iterators. There are multiple implementations available in DataVec to support different configurations. We can also align the input/label at the last time step of the sample. For this purpose, we added AlignmentMode.ALIGN_END into the iterator definition. If there are varying time steps, shorter time series will be padded to the length of the longest time series. So, if there are samples that have fewer than 60 time steps recorded for a sequence, then zero values will be padded to the time series data.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset