Model hyperparameters

As with any deep learning architecture, there are a few hyperparameters that someone can use to control the model and fine-tune it. The following is the set of hyperparameters that we are using for this architecture:

  • Batch size is the number of sequences running through the network in one pass.
  • The number of steps is the number of characters in the sequence the network is trained on. Larger is better typically; the network will learn more long-range dependencies, but will take longer to train. 100 is typically a good number here.
  • The LSTM size is the number of units in the hidden layers.
  • Architecture number layers is the number of hidden LSTM layers to use.
  • Learning rate is the typical learning rate for training.
  • And finally, the new thing that we call keep probability is used by the dropout layer; it helps the network to avoid overfitting. So if your network is overfitting, try decreasing this.
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset