In this section, we will discuss how to implement some of the neural network structures with the deeplearning4j library. Let's start.
As we discussed in Chapter 2, Java Libraries and Platforms for Machine Learning, deeplearning4j is an open source, distributed deep learning project in Java and Scala. Deeplearning4j relies on Spark and Hadoop for MapReduce, trains models in parallel, and iteratively averages the parameters they produce in a central model. A detailed library summary is presented in Chapter 2, Java Libraries and Platforms for Machine Learning.
The most convenient way to get deeplearning4j is through the Maven repository:
pom.xml
file and add the following dependencies under the <dependencies>
section:<dependency> <groupId>org.deeplearning4j</groupId> <artifactId>deeplearning4j-nlp</artifactId> <version>${dl4j.version}</version> </dependency> <dependency> <groupId>org.deeplearning4j</groupId> <artifactId>deeplearning4j-core</artifactId> <version>${dl4j.version}</version> </dependency>
One of the most famous datasets is MNIST dataset, which consists of handwritten digits, as shown in the following image. The dataset comprises 60,000 training and 10,000 testing images:
The dataset is commonly used in image recognition problems to benchmark algorithms The worst recorded error rate is 12%, with no preprocessing and using a SVM in one-layer neural network. Currently, as of 2016, the lowest error rate is only 0.21%, using the DropConnect neural network, followed by deep convolutional network at 0.23%, and deep feedforward network at 0.35%.
Now, let's see how to load the dataset.
Deeplearning4j provides the MNIST dataset loader out of the box. The loader is initialized as DataSetIterator
. Let's first import the DataSetIterator
class and all the supported datasets that are part of the impl
package, for example, Iris, MNIST, and others:
import org.deeplearning4j.datasets.iterator.DataSetIterator; import org.deeplearning4j.datasets.iterator.impl.*;
Next, we'll define some constants, for instance, the images consist of 28 x 28 pixels and there are 10 target classes and 60,000 samples. Initialize a new MnistDataSetIterator
class that will download the dataset and its labels. The parameters are iteration batch size, total number of examples, and whether the datasets should be binarized or not:
int numRows = 28; int numColumns = 28; int outputNum = 10; int numSamples = 60000; int batchSize = 100; DataSetIterator iter = new MnistDataSetIterator(batchSize, numSamples,true);
Having an already-implemented data importer is really convenient, but it won't work on your data. Let's take a quick look at how is it implemented and what needs to be modified to support your dataset. If you're eager to start implementing neural networks, you can safely skip the rest of this section and return to it when you need to import your own data.
To load the custom data, you'll need to implement two classes: DataSetIterator
that holds all the information about the dataset and BaseDataFetcher
that actually pulls the data either from file, database, or web. Sample implementations are available on GitHub at https://github.com/deeplearning4j/deeplearning4j/tree/master/deeplearning4j-core/src/main/java/org/deeplearning4j/datasets/iterator/impl.
Another option is to use the Canova library, which is developed by the same authors, at http://deeplearning4j.org/canovadoc/.
In this section, we'll discuss how to build an actual neural network model. We'll start with a basic single-layer neural network to establish a benchmark and discuss the basic operations. Later, we'll improve this initial result with DBN and Multilayer Convolutional Network.
Let's start by building a single-layer regression model based on the softmax activation function, as shown in the following diagram. As we have a single layer, Input to the neural network will be all the figure pixels, that is, 28 x 28 = 748 neurons. The number of Output neurons is 10, one for each digit. The network layers are fully connected, as shown in the following diagram:
A neural network is defined through a NeuralNetConfiguration
Builder
object as follows:
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
We will define the parameters for gradient search in order to perform iterations with the conjugate gradient optimization algorithm. The momentum parameter determines how fast the optimization algorithm converges to an local optimum—the higher the momentum, the faster the training; but higher speed can lower model's accuracy, as follows:
.seed(seed) .gradientNormalization(GradientNormalization.ClipElementWiseAbsoluteValue) .gradientNormalizationThreshold(1.0) .iterations(iterations) .momentum(0.5) .momentumAfter(Collections.singletonMap(3, 0.9)) .optimizationAlgo(OptimizationAlgorithm.CONJUGATE_GRADIENT)
Next, we will specify that the network will have one layer and define the error function (NEGATIVELOGLIKELIHOOD
), internal perceptron activation function (softmax
), and the number of input and output layers that correspond to total image pixels and the number of target variables:
.list(1) .layer(0, new OutputLayer.Builder(LossFunction.NEGATIVELOGLIKELIHOOD) .activation("softmax") .nIn(numRows*numColumns).nOut(outputNum).build())
Finally, we will set the network to pretrain
, disable backpropagation, and actually build the untrained network structure:
.pretrain(true).backprop(false) .build();
Once the network structure is defined, we can use it to initialize a new MultiLayerNetwork
, as follows:
MultiLayerNetwork model = new MultiLayerNetwork(conf); model.init();
Next, we will point the model to the training data by calling the setListeners
method, as follows:
model.setListeners(Collections.singletonList((IterationListener) new ScoreIterationListener(listenerFreq)));
We will also call the fit(int)
method to trigger an end-to-end network training:
model.fit(iter);
To evaluate the model, we will initialize a new Evaluation
object that will store batch results:
Evaluation eval = new Evaluation(outputNum);
We can then iterate over the dataset in batches in order to keep the memory consumption at a reasonable rate and store the results in an eval
object:
DataSetIterator testIter = new MnistDataSetIterator(100,10000); while(testIter.hasNext()) { DataSet testMnist = testIter.next(); INDArray predict2 = model.output(testMnist.getFeatureMatrix()); eval.eval(testMnist.getLabels(), predict2); }
Finally, we can get the results by calling the stats()
function:
log.info(eval.stats());
A basic one-layer model achieves the following accuracy:
Accuracy: 0.8945 Precision: 0.8985 Recall: 0.8922 F1 Score: 0.8953
Getting 89.22% accuracy, that is, 10.88% error rate, on MNIST dataset is quite bad. We'll improve that by going from a simple one-layer network to the moderately sophisticated deep belief network using Restricted Boltzmann machines and Multilayer Convolutional Network.
In this section, we'll build a deep belief network based on Restricted Boltzmann machine, as shown in the following diagram. The network consists of four layers: the first layer recedes the 748 inputs to 500 neurons, then to 250, followed by 200, and finally to the last 10 target values:
As the code is the same as in the previous example, let's take a look at how to configure such a network:
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder()
We defined the gradient optimization algorithm, as shown in the following code:
.seed(seed) .gradientNormalization( GradientNormalization.ClipElementWiseAbsoluteValue) .gradientNormalizationThreshold(1.0) .iterations(iterations) .momentum(0.5) .momentumAfter(Collections.singletonMap(3, 0.9)) .optimizationAlgo(OptimizationAlgorithm.CONJUGATE_GRADIENT)
We will also specify that our network will have four layers:
.list(4)
The input to the first layer will be 748 neurons and the output will be 500 neurons. We'll use the root mean squared-error cross entropy, Xavier algorithm, to initialize weights by automatically determining the scale of initialization based on the number of input and output neurons, as follows:
.layer(0, new RBM.Builder() .nIn(numRows*numColumns) .nOut(500) .weightInit(WeightInit.XAVIER) .lossFunction(LossFunction.RMSE_XENT) .visibleUnit(RBM.VisibleUnit.BINARY) .hiddenUnit(RBM.HiddenUnit.BINARY) .build())
The next two layers will have the same parameters, except the number of input and output neurons:
.layer(1, new RBM.Builder() .nIn(500) .nOut(250) .weightInit(WeightInit.XAVIER) .lossFunction(LossFunction.RMSE_XENT) .visibleUnit(RBM.VisibleUnit.BINARY) .hiddenUnit(RBM.HiddenUnit.BINARY) .build()) .layer(2, new RBM.Builder() .nIn(250) .nOut(200) .weightInit(WeightInit.XAVIER) .lossFunction(LossFunction.RMSE_XENT) .visibleUnit(RBM.VisibleUnit.BINARY) .hiddenUnit(RBM.HiddenUnit.BINARY) .build())
Now the last layer will map the neurons to outputs, where we'll use the softmax
activation function, as follows:
.layer(3, new OutputLayer.Builder() .nIn(200) .nOut(outputNum) .lossFunction(LossFunction.NEGATIVELOGLIKELIHOOD) .activation("softmax") .build()) .pretrain(true).backprop(false) .build();
The rest of the training and evaluation is the same as in the single-layer network example. Note that training deep network might take significantly more time compared to a single-layer network. The accuracy should be around 93%.
Now let's take a look at another deep network.
In the final example, we'll discuss how to build a convolutional network, as shown in the following diagram. The network will consist of seven layers: first, we'll repeat two pairs of convolutional and subsampling layers with max pooling. The last subsampling layer is then connected to a densely connected feedforward neuronal network, comprising 120 neurons, 84 neurons, and 10 neurons in the last three layers, respectively. Such a network effectively forms the complete image recognition pipeline, where the first four layers correspond to feature extraction and the last three layers correspond to the learning model:
Network configuration is initialized as we did earlier:
MultiLayerConfiguration.Builder conf = new NeuralNetConfiguration.Builder()
We will specify the gradient descent algorithm and its parameters, as follows:
.seed(seed) .iterations(iterations) .activation("sigmoid") .weightInit(WeightInit.DISTRIBUTION) .dist(new NormalDistribution(0.0, 0.01)) .learningRate(1e-3) .learningRateScoreBasedDecayRate(1e-1) .optimizationAlgo( OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT)
We will also specify the seven network layers, as follows:
.list(7)
The input to the first convolutional layer is the complete image, while the output is six feature maps. The convolutional layer will apply a 5 x 5 filter, and the result will be stored in a 1 x 1 cell:
.layer(0, new ConvolutionLayer.Builder( new int[]{5, 5}, new int[]{1, 1}) .name("cnn1") .nIn(numRows*numColumns) .nOut(6) .build())
The second layer is a subsampling layer that will take a 2 x 2 region and store the max result into a 2 x 2 element:
.layer(1, new SubsamplingLayer.Builder( SubsamplingLayer.PoolingType.MAX, new int[]{2, 2}, new int[]{2, 2}) .name("maxpool1") .build())
The next two layers will repeat the the previous two layers:
.layer(2, new ConvolutionLayer.Builder(new int[]{5, 5}, new int[]{1, 1}) .name("cnn2") .nOut(16) .biasInit(1) .build()) .layer(3, new SubsamplingLayer.Builder(SubsamplingLayer.PoolingType.MAX, new int[]{2, 2}, new int[]{2, 2}) .name("maxpool2") .build())
Now we will wire the output of the subsampling layer into a dense feedforward network, consisting of 120
neurons, and then through another layer, into 84
neurons, as follows:
.layer(4, new DenseLayer.Builder() .name("ffn1") .nOut(120) .build()) .layer(5, new DenseLayer.Builder() .name("ffn2") .nOut(84) .build())
The final layer connects 84
neurons with 10
output neurons:
.layer(6, new OutputLayer.Builder(LossFunctions.LossFunction.NEGATIVELOGLIKELIHOOD) .name("output") .nOut(outputNum) .activation("softmax") // radial basis function required .build()) .backprop(true) .pretrain(false) .cnnInputSize(numRows,numColumns,1);
To train this structure, we can reuse the code that we developed in the previous two examples. Again, the training might take some time. The network accuracy should be around 98%.
As model training significantly relies on linear algebra, training can be significantly sped up by using Graphics Processing Unit (GPU) for an order of magnitude. As GPU backend is at the time of writing undergoing a rewrite, please check the latest documentation at http://deeplearning4j.org/documentation
As we saw in different examples, increasingly more complex neural networks allow us to extract relevant features automatically, thus completely avoiding traditional image processing. However, the price we pay for this is an increased processing time and a lot of learning examples to make this approach efficient.