For many problems, you can start with one or two hidden layers and it will work just fine using two hidden layers with the same total amount of neurons, in roughly the same amount of training time. For more complex problems, you can gradually ramp up the number of hidden layers until you start overfitting the training set. Very complex tasks, such as large image classification or speech recognition, typically require networks with dozens of layers and they need a large amount of training data.