As with any deep learning architecture, there are a few hyperparameters that someone can use to control the model and fine-tune it. The following is the set of hyperparameters that we are using for this architecture:
- Batch size is the number of sequences running through the network in one pass.
- The number of steps is the number of characters in the sequence the network is trained on. Larger is better typically; the network will learn more long-range dependencies, but will take longer to train. 100 is typically a good number here.
- The LSTM size is the number of units in the hidden layers.
- Architecture number layers is the number of hidden LSTM layers to use.
- Learning rate is the typical learning rate for training.
- And finally, the new thing that we call keep probability is used by the dropout layer; it helps the network to avoid overfitting. So if your network is overfitting, try decreasing this.