In this section, we're going to see what happens when we train deep networks, which has many layers, probably over 30 or maybe even over 100. Then, we'll present residual networks as a solution for scaling too many layers, along with an architecture example with state-of-the-art accuracy.