17.3 Gradient descent in its full form
Gradient descent is one of the most important algorithms in machine learning. We have talked about this a lot, although up until this point, we have only seen it for single-variable functions (which is, I admit, not the most practical use case).
However, now we have all the tools we need to talk about gradient descent in its general form. Let’s get to it!
Suppose that we have a differentiable vector-scalar function f : ℝn →ℝ that we want to maximize. This can describe the return on investment of an investing strategy, or any other quantity. Calculating the gradient and finding the critical points is often not an option, as solving the equation ∇f(x) = 0 can be computationally unfeasible. Thus, we resort to an iterative solution.
The algorithm is the same as for single-variable functions (as seen in Section 13.2):
- Start from a random point.
- Calculate its gradient.
- Take a step towards its direction.
- Repeat...