How it works...

In step 1, we divided the input data into training and validation and created data iterators for each. These data iterators are iterator objects that allow fetching batches of data sequentially by calling next, each batch containing some training examples and their respective labels.In step 2, we created an RNN symbol. We specified the number of layers as two and the number of hidden units as 30. We configured the type of RNN cell to lstm and set the config parameter to one-to-one. In the next step, we defined the loss function. In step 4, we used the mx.opt.create() function to create an optimizer by name and parameters. We created an adadelta optimizer and configured its parameters. The wd parameter is an L2 regularization coefficient and the clip_gradient argument clips the gradient by projecting onto the box, [-clip_gradient,clip_gradient]. We used Xavier weight initialization for our model. In this type of weight initialization, the variance remains the same with each passing layer, hence preventing the network from vanishing or exploding gradient problems.

To read more about this technique, please refer to this paper: http://proceedings.mlr.press/v9/glorot10a/glorot10a.pdf.

In step 5, we trained the network for 50 epochs using buckets. Bucketing is a technique used to train multiple networks with different but similar architectures that share the same set of parameters. In the next step, we extracted the state symbols from the trained model to use it for inference. In step 7, we created an inference model. Finally, in the last step, we predicted the values for the first test sample.

Table of Contents for How it works...

Create new playlist

Sign In

Sign Up

Table of Contents for
How it works...