DNNs for geographic ethnicity prediction

Multilayer Perceptron (MLP) is an example of a DNN that is a feed-forward neural network; that is, there are only connections between the neurons from different layers. There is one (pass through) input layer, one or more layers of linear threshold units (LTUs) (called hidden layers), and one final layer of LTUs (called the output layer).

Each layer, excluding the output layer, involves a bias neuron and is fully connected to the next layer, forming a fully connected bipartite graph. The signal flows exclusively from the input to the output, that is, one-directional (feed-forward).

Until recently, an MLP was trained using the back-propagation training algorithm, but now the optimized version (that is, Gradient Descent) uses a reverse-mode auto diff; that is, the neural networks are trained with SGD using back-propagation as a gradient computing technique. Two layers of abstraction are used in DNN training for solving classification problems:

  • Gradient computation: Using back-propagation
  • Optimization level: Using SGD, ADAM, RMSPro, and Momentum optimizers to compute the gradient computed earlier

In each training cycle, the algorithm feeds the data into the network and computes the state and output for every neuron in the consecutive layers. The approach then measures the output error over the network, that is, the gap between the expected output and the current output, and the contribution from each neuron in the last hidden layer towards the neuron's output error.

Iteratively, the output error is propagated back to the input layer through all hidden layers and the error gradient is calculated across all connection weights during backward propagation:

Figure 10: A modern MLP consisting of input layer, ReLU, and softmax

For a multiclass classification task, the output layer is typically determined by a shared softmax function (see Figure 2 for more) in contrast to individual activation functions, and each output neuron provides the estimated probability for the corresponding class.

Additionally, we will be using tree ensembles such as random forest for the classification. At this moment, I believe we can skip the basic introduction of RF since we have covered it in detail in Chapter 1, Analyzing Insurance Severity Claims, Chapter 2, Analyzing and Predicting Telecommunication Churn, and Chapter 3, High-Frequency Bitcoin Price Prediction from Historical Data. Well, it is time for the being stared. Nonetheless, it is always good to have your programming environment ready before getting your hands dirty.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset