How to do it...

  1. Categorize the sequence data programmatically:
// convert URI to string
final String data = IOUtils.toString(new URL(url),"utf-8");
// Get sequences from the raw data
final String[] sequences = data.split(" ");
final List<Pair<String,Integer>> contentAndLabels = new ArrayList<>();
int lineCount = 0;
for(String sequence : sequences) {
// Record each time step in new line
sequence = sequence.replaceAll(" +"," ");
// Labels: first 100 examples (lines) are label 0, second 100 examples are label 1, and so on
contentAndLabels.add(new Pair<>(sequence, lineCount++ / 100));
}

  1. Store the features/labels in their corresponding directories by following the numbered format:
for(Pair<String,Integer> sequencePair : contentAndLabels) {
if(trainCount<450) {
featureFile = new File(trainfeatureDir+trainCount+".csv");
labelFile = new File(trainlabelDir+trainCount+".csv");
trainCount++;
} else {
featureFile = new File(testfeatureDir+testCount+".csv");
labelFile = new File(testlabelDir+testCount+".csv");
testCount++;
}
}

  1. Use FileUtils to write the data into files:
FileUtils.writeStringToFile(featureFile,sequencePair.getFirst(),"utf-8");
FileUtils.writeStringToFile(labelFile,sequencePair.getSecond().toString(),"utf-8");
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset