The code used for developing the model is as follows:
# Initializing the model
model <- keras_model_sequential()
# Model architecture
model %>%
layer_dense(units = 8, activation = 'relu', input_shape = c(21)) %>%
layer_dense(units = 3, activation = 'softmax')
As shown in the preceding code, we start by creating a sequential model using the keras_model_sequential() function, which allows a linear stack of layers to be added. Next, we add layers to the model using the pipe operator, %>%. This pipe operator takes information from the left as output and feeds that information as input to what is on the right. We use a fully connected or densely connected neural network using the layer_dense function and then specify various inputs. In this dataset, we have 21 independent variables, and as such, the input_shape function is specified as 21 neurons or units in the neural network. This layer is also termed as the input layer in the network. The first hidden layer has 8 units and the activation function that we use here is a rectified linear unit, or relu, which is the most popular activation function used in these situations. The first hidden layer is connected to the output layer with 3 units using the pipe operator. We use 3 units since our target variable has 3 classes. The activation function used in the output layer is 'softmax', which helps to keep the range of output values between 0 and 1. Keeping the range of output values between 0 and 1 will help us to interpret results in the form of familiar probability values.
To obtain a summary of the model architecture that we have created, we can run the summary function, as shown in the following code:
# Model summary
summary(model)
OUTPUT
## ___________________________________________________________________________
## Layer (type) Output Shape Param #
## ===========================================================================
## dense_1 (Dense) (None, 8) 176
## ___________________________________________________________________________
## dense_2 (Dense) (None, 3) 27
## ===========================================================================
## Total params: 203
## Trainable params: 203
## Non-trainable params: 0
## ___________________________________________________________________________
Since the input layer has 21 units that are connected to each of the 8 units in the first hidden layer, we end up with 168 weights (21 x 8). We also obtain one bias term for each unit in the hidden layer, with a total of 8 such terms. So, at the first and only hidden layer stage, we have a total of 176 parameters (168 + 8). Similarly, 8 units in the hidden layer are connected to 3 units in the output layer, yielding 24 weights (8 x 3). This way, we have 24 weights and 3 bias terms at the output layer that account for a total of 27 parameters. Finally, the total number of parameters for this neural network architecture will be 203.