In this chapter, we will focus on neural networks, often referred to as Deep Learning Networks (DLNs). This type of network is characterized as a multiple-layer neural network. Each of these layers are rained on the output of the previous layer, potentially identifying features and sub-features of the dataset. A feature hierarchy is created in this manner.
DLNs typically work with unstructured and unlabeled data, which constitute the vast bulk of data found in the world today. DLN will take this unstructured data, identify features, and try to reconstruct the original input. This approach is illustrated with Restricted Boltzmann Machines (RBMs) in Restricted Boltzmann Machines and with autoencoders in Deep autoencoders. An autoencoder takes a dataset and effectively compresses it. It then decompresses it to reconstruct the original dataset.
DLN can also be used for predictive analysis. The last step of a DLN will use an activation function to generate output represented by one of several categories. When used with new data, the model will attempt to classify the input based on the previously trained model.
An important DLN task is ensuring that the model is accurate and minimizes error. As with simple neural networks, weights and biases are used at each layer. As weight values are adjusted, errors can be introduced. A technique to adjust weights uses gradient descent. This can be thought of as the slope of the change. The idea is to modify the weight so as to minimize the error. It is an optimization technique that speeds up the learning process.
Later in the chapter, we will examine Convolutional Neural Networks (CNNs) and briefly discuss Recurrent Neural Networks (RNN). Convolution networks mimic the visual cortex in that each neuron can interact with and make decisions based on a region of information. Recurrent networks process information based on not only the output of the previous layer but also the calculations performed in previous layers.
There are several libraries that support deep learning, including these:
ND4J is a lower level library that is actually used in other projects, including DL4J. Encog is perhaps not as well supported as DL4J, but does provide support for deep learning.
The examples used in this chapter are all based on the Deep Learning for Java (DL4J) (http://deeplearning4j.org) API with support from ND4J. This library provides good support for many of the algorithms associated with deep learning. As a result, the next section explains the basic tasks found in common with many of the deep learning algorithms, such as loading data, training a model, and testing the model.
In this section, we will discuss its architecture and address several of the common tasks performed when using the API. DLN typically starts with the creation of a MultiLayerConfiguration
instance, which defines the network, or model. The network is composed of multiple layers. Hyperparameters are used to configure the network and are variables that affect such things as learning speed, activation functions to use for a layer, and how weights are to be initialized.
As with neural networks, the basic DLN process consists of:
We will investigate each of these tasks in the next sections.
The DL4J API has a number of techniques for acquiring data. We will focus on those specific techniques that we will use in our examples. The dataset used by a DL4J project is often modified using either binarization or normalization. Binarization converts data to ones and zeroes. Normalization converts data to a value between 1 and 0.
Data feed to DLN is transformed to a set of numbers. These numbers are referred to as vectors. These vectors consist of a one-column matrix with a variable number of rows. The process of creating a vector is called vectorization.
Canova (http://deeplearning4j.org/canova.html) is a DL4J library that supports vectorization. It works with many different types of datasets. It has been merged with DataVec (http://deeplearning4j.org/datavec), a vectorization and Extract, Transform, and Load (ETL) library.
In this section, we will focus on how to read in CSV data.
ND4J provides the CSVRecordReader
class, which is useful for reading CSV data. It has three overloaded constructors. The one we will demonstrate is passed two arguments. The first is the number of lines to skip when first reading a file and the second is a string holding the delimiters used to parse the text.
In the following code, we create a new instance of the class, where we do not skip any lines and use only a comma for a delimiter:
RecordReader recordReader = new CSVRecordReader(0, ",");
The class implements the RecordReader
interface. It has an initialize
method that is passed an instance of the FileSplit
class. One of its constructors is passed an instance of a File
object that references a dataset. The FileSplit
class assists in splitting the data for training and testing. In this example, we initialize the reader for a file called car.txt
that we will use in the Preparing the data section:
recordReader.initialize(new FileSplit(new File("car.txt")));
To process the data, we need an iterator such as the DataSetIterator
instance shown next. This class possesses a multitude of overloaded constructors. In the following example, the first argument is the RecordReader
instance. This is followed by three arguments. The first is the batch size, which is the number of records to retrieve at a time. The next one is the index of the last attribute of the record. The last argument is the number of classes represented by the dataset:
DataSetIterator iterator = new RecordReaderDataSetIterator(recordReader, 1728, 6, 4);
The file's record's last attribute will hold a class value if we use a dataset for regression. This is precisely how we will use it later. The number of the class's parameter is only used with regression.
In the next code sequence, we will split the dataset into two sets: one for training and one for testing. Starting with the next
method, this method returns the next dataset from the source. The size of the dataset is dependent on the batch size used earlier. The shuffle
method randomizes the input while the splitTestAndTrain
method returns an instance of the SplitTestAndTrain
class, which we use to get the training and testing datasets. The splitTestAndTrain
method's argument specifies the percentage of the data to be used for training.
DataSet dataset = iterator.next(); dataset.shuffle(); SplitTestAndTrain testAndTrain = dataset.splitTestAndTrain(0.65); DataSet trainingData = testAndTrain.getTrain(); DataSet testData = testAndTrain.getTest();
We can then use these datasets with a model.
Frequently, DL4J uses the MultiLayerConfiguration
class to define the configuration of the model and the MultiLayerNetwork
class to represent a model. These classes provide a flexible way of building models.
In the following example, we will demonstrate the use of these classes. Starting with the MultiLayerConfiguration
class, we find that several methods are used in a fluent style. We will provide more details about these methods shortly. However, notice that two layers are defined for this model:
MultiLayerConfiguration conf = new NeuralNetConfiguration.Builder() .iterations(1000) .activation("relu") .weightInit(WeightInit.XAVIER) .learningRate(0.4) .list() .layer(0, new DenseLayer.Builder() .nIn(6).nOut(3) .build()) .layer(1, new OutputLayer .Builder(LossFunctions.LossFunction .NEGATIVELOGLIKELIHOOD) .activation("softmax") .nIn(3).nOut(4).build()) .backprop(true).pretrain(false) .build();
The nIn
and nOut
methods specify the number of inputs and outputs for a layer.
Builder classes are common in DL4J. In the previous example, the NeuralNetConfiguration.Builder
class is used. The methods used here are but a few of the many that are available. In the following table, we describe several of them:
Method |
Usage |
|
Controls the number of optimization iterations performed |
|
This is the activation function used |
|
Used to initialize the initial weights for the model |
|
Controls the speed the model learns |
|
Creates an instance of the |
|
Creates a new layer |
|
When set to true, it enables backpropagation |
|
When set to true, it will pretrain the model |
|
Performs the actual build process |
Let's examine how a layer is created more closely. In the example, the list
method returns a NeuralNetConfiguration.ListBuilder
instance. Its layer
method takes two arguments. The first is the number of the layer, a zero-based numbering scheme. The second is the Layer
instance.
There are two different layers used here with two different builders: a DenseLayer.Builder
and an OutputLayer.Builder
instance. There are several types of layers available in DL4J. The argument of a builder's constructor may be a loss function, as is the case with the output layer, and is explained next.
In a feedback network, the neural network's guess is compared to what is called the ground truth, which is the error. This error is used to update the network through the modification of weights and biases. The loss function, also called an objective or cost function, measures the difference.
There are several loss functions supported by DL4J:
MSE
: In linear regression MSE stands for mean squared errorEXPLL
: In poisson regression EXPLL stands for exponential log likelihoodXENT
: In binary classification XENT stands for cross entropyMCXENT
: This stands for multiclass cross entropyRMSE_XENT
: This stands for RMSE cross entropySQUARED_LOSS
: This stands for squared lossRECONSTRUCTION_CROSSENTROPY
: This stands for reconstruction cross entropyNEGATIVELOGLIKELIHOOD
: This stands for negative log likelihoodCUSTOM
: Define your own loss functionThe remaining methods used with the builder instance are the activation function, the number of inputs and outputs for the layer, and the build
method, which creates the layer.
Each layer of a multi-layer network requires the following:
There are many different types of activation functions, each of which can address a particular type of problem.
The activation function is used to determine whether the neuron fires. There are several functions supported, including relu
(rectified linear), tanh
, sigmoid
, softmax
, hardtanh
, leakyrelu
, maxout
, softsign
, and softplus
.
An interesting list of activation functions along with graphs is found at http://stats.stackexchange.com/questions/115258/comprehensive-list-of-activation-functions-in-neural-networks-with-pros-cons and https://en.wikipedia.org/wiki/Activation_function.
Next, a MultiLayerNetwork
instance is created using the defined configuration. The model is initialized, and its listeners are set. The ScoreIterationListener
instance will display information as the model trains, which we will see shortly. Its constructor's argument specifies how often that information should be displayed:
MultiLayerNetwork model = new MultiLayerNetwork(conf); model.init(); model.setListeners(new ScoreIterationListener(100));
We are now ready to train the model.
This is actually a fairly simple step. The fit
method performs the training:
model.fit(trainingData);
When executed, the output will be generated using any listeners associated with the model, as is the preceding case, where a ScoreIterationListener
instance is used.
Another example of how the fit
method is used is through the process of iterating through a dataset, as shown next. In this example, a sequence of datasets is used. This is the part of an autoencoder where the output is intended to match the input, as explained in Deep autoencoders section. The dataset used as the argument to the fit
method uses both the input and the expected output. In this case, they are the same as provided by the getFeatureMatrix
method:
while (iterator.hasNext()) { DataSet dataSet = iterator.next(); model.fit(new DataSet(dataSet.getFeatureMatrix(), dataSet.getFeatureMatrix())); }
For larger datasets, it is necessary to pretrain the model several times to get accurate results. This is often performed in parallel to reduce training time. This option is set with a layer class's pretrain
method.
The evaluation of a model is performed using the Evaluation
class and the training dataset. An Evaluation
instance is created using an argument specifying the number of classes. The test data is fed into the model using the output
method. The eval
method takes the output of the model and compares it against the test data classes to generate statistics:
Evaluation evaluation = new Evaluation(4); INDArray output = model.output(testData.getFeatureMatrix()); evaluation.eval(testData.getLabels(), output); out.println(evaluation.stats());
The output will look similar to the following:
==========================Scores=================================== Accuracy: 0.9273 Precision: 0.854 Recall: 0.8323 F1 Score: 0.843
These statistics are detailed here:
Accuracy
: This is a measure of how often the correct answer was returned.Precision
: This is a measure of the probability that a positive response is correct.Recall
: This measures how likely the result will be classified correctly if given a positive example.F1 Score
: This is the probability that the network's results are correct. It is the harmonic mean of recall and precision. It is calculated by dividing the number of true positives by the sum of true positives and false negatives.We will use the Evaluation
class to determine the quality of our model. A measure called f1 is used, whose values range from 0 to 1, where 1 represents the best quality.