How it works...

For step 1, ReLU is most commonly used because of its non-linear behavior. The output layer activation function depends on the expected output behavior. Step 4 targets this too. 

For step 2, Leaky ReLU is an improved version of ReLU and is used to avoid the zero gradient problem. However, you might observe a performance drop. We use Leaky ReLU if dead neurons are observed during training. Dead neurons are referred to as neurons with a zero gradient for all possible inputs, which makes them useless for training.

For step 3, the tanh and sigmoid activation functions are similar and are used in feed-forward networks. If you use these activation functions, then make sure you add regularization to network layers to avoid the vanishing gradient problem. These are generally used for classifier problems.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset