How to do it...

  1. Add features and labels into the schema:
Schema.Builder schemaBuilder = new Schema.Builder();
schemaBuilder.addColumnString("RowNumber")
schemaBuilder.addColumnInteger("CustomerId")
schemaBuilder.addColumnString("Surname")
schemaBuilder.addColumnInteger("CreditScore");
  1. Identify and add categorical features to the schema:
schemaBuilder.addColumnCategorical("Geography", Arrays.asList("France","Germany","Spain"))
schemaBuilder.addColumnCategorical("Gender", Arrays.asList("Male","Female"));
  1. Remove noise features from the dataset:
Schema schema = schemaBuilder.build();
TransformProcess.Builder transformProcessBuilder = new TransformProcess.Builder(schema);
transformProcessBuilder.removeColumns("RowNumber","CustomerId","Surname");
  1. Transform categorical variables:
transformProcessBuilder.categoricalToInteger("Gender");

  1. Apply one-hot encoding by calling categoricalToOneHot():
transformProcessBuilder.categoricalToInteger("Gender")
transformProcessBuilder.categoricalToOneHot("Geography");
  1. Remove the correlation dependency on the Geography feature by calling removeColumns():
transformProcessBuilder.removeColumns("Geography[France]")

Here, we selected France as the correlation variable.

  1. Extract the data and apply the transformation using TransformProcessRecordReader:
TransformProcess transformProcess = transformProcessBuilder.build();
TransformProcessRecordReader transformProcessRecordReader = new TransformProcessRecordReader(recordReader,transformProcess);
  1. Create a dataset iterator to train/test:
DataSetIterator dataSetIterator = new RecordReaderDataSetIterator.Builder(transformProcessRecordReader,batchSize) .classification(labelIndex,numClasses)
.build();
  1. Normalize the dataset:
DataNormalization dataNormalization = new NormalizerStandardize();
dataNormalization.fit(dataSetIterator);
dataSetIterator.setPreProcessor(dataNormalization);
  1. Split the main dataset iterator to train and test iterators:
DataSetIteratorSplitter dataSetIteratorSplitter = new DataSetIteratorSplitter(dataSetIterator,totalNoOfBatches,ratio);
  1. Generate train/test iterators from DataSetIteratorSplitter:
DataSetIterator trainIterator = dataSetIteratorSplitter.getTrainIterator();
DataSetIterator testIterator = dataSetIteratorSplitter.getTestIterator();

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset