13.2 The basics of gradient descent
We need to solve two computational problems to train neural networks:
- computing the derivative of the loss L(w),
- and finding its minima using the derivative.
Finding the minima by solving
L(w) = 0 is not going to work in practice. There are several problems. First, as we have seen, not all solutions are minimal points: there are maximal and inflection points as well. Second, solving this equation is not feasible except in the simplest cases, like for linear regression with the mean squared error. Training a neural network is not a simple case.
Fortunately for us, machine learning practitioners, there is a solution: gradient descent! The famous gradient descent provides a way to tackle the complexity of finding the exact solution, enabling us to do machine learning on a large scale. Let’s see how it’s done!
13.2.1 Derivatives, revisited
When we first explored the concept of the derivative in Chapter 12, we saw its many faces....