Compiling the model

Compiling a model typically involves specifying the loss function, choosing an optimizer, and specifying the metrics to be used. These choices, however, depend on the type of problem that is being addressed. The following code is an example of R for compiling a deep learning binary classification model:

model %>% 
compile(loss = 'binary_crossentropy',
optimizer = 'adam',
metrics = 'accuracy')

The preceding loss function specified is binary_crossentropy, which is used when the response variable has two classes. Binary cross-entropy can be calculated using the following formula:

In the preceding formula, y represents the actual class and yhat represents the prediction probability. Let's consider two examples using the following code:

# Example-1
y <- c(0, 0, 0, 1, 1, 1)
yhat <- c(0.2, 0.3, 0.1, 0.8, 0.9, 0.7)
(loss <- - y*log(yhat) - (1-y)*log(1-yhat))

[1] 0.2231436 0.3566749 0.1053605 0.2231436 0.1053605 0.3566749

mean(loss)

[1] 0.228393

# Example-2
yhat <- c(0.2, 0.9, 0.1, 0.8, 0.9, 0.2)
(loss <- - y*log(yhat) - (1-y)*log(1-yhat))

[1] 0.2231436 2.3025851 0.1053605 0.2231436 0.1053605 1.6094379

mean(loss)

[1] 0.761505

As seen in Example-1, there are a total of six cases represented by y where the first three cases indicate the actual class to be 0, and the next three cases have the actual class as 1. The prediction probabilities captured by yhat is the probability that a case belongs to category 1. In Example-1, the yhat values correctly classify all six cases, and the average of all loss values is about 0.228. In Example-2, the yhat values correctly classify only four cases, and the average of all loss values now increases to about 0.762. The binary cross-entropy loss function in this way helps to assess the classification performance of a model. The lower the loss value is, the better the classification performance, and the higher the loss value is, the worse the classification performance of the model.

There are various other loss functions that are used based on the type of problem for which the deep learning network is being developed. For classification models where the response variables have more than two classes, we make use of the categorical_crossentropy loss function. For regression problems with numeric response variables, the mean square error (mse) may be an appropriate loss function.

When specifying an optimizer to be used by the model, adam is a popular choice for deep learning networks, giving good results in a wide variety of situations. Other commonly used optimizers include rmsprop and adagrad. When a deep learning network is being trained, the parameters of the network are modified based on feedback obtained from the loss function. How this modification of parameters takes place is based on which optimizer is used. The choice of a suitable optimizer is therefore important in arriving at a suitable model.

When compiling the model, we also specify a suitable metric that will be used for monitoring the training process. For classification problems, accuracy is a one of the most commonly used metrics. For regression problems, the mean absolute error is a commonly specified metric.

Once we compile a model, we are ready to fit it.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset