Gradient descent

The goal of gradient descent is to find the minimal x of the function $f (x)$ .

We start with a random $x_{i + 1}$ and improve with $f (x_{i + 1}) \leq f (x_{i})$

For the direction to move, we choose the negative gradient $- \nabla f (x_{i})$

and compute further steps with: $x_{i + 1} - x_{i} = - μ_{i} \nabla f (x_{i})$ which means: Next x - current x is negative, as we reduce step by step. The step is equal to the stepsize times the negative gradient.

Therefore, the next x is calculated as:

$x_{i} = x_{i - 1} - μ * \nabla f (x_{i - 1})$

or alternatively:

$x_{i + 1} = x_{i} - μ * \nabla f (x_{i})$

Understanding stepsize $μ$

Can we set the stepsize simply to 1?

No. The function value of the next point would not decrease, therefore, our process would be stuck.

The stepsize can potentially change in the process of gradient descent.

How many minima can a function have?

Functions can have different local minima, thus, points where x with $\nabla f (x) = 0$

A function is convex, when the straight line connecting any two points is always above the graph. This is true for functions like $f (x) = x^{2}$ . Function with more than one minimum which have differnet function values are never convex. Yet, a function can be convex and in fact have more than one minimum - if the minima have the same function values.

For a convex function, all local minima have the same function value.
A two-times differentiable function is convex, if its Hessian is positive semidefinite
The matrix $A^{T} A$ is positive semidefinite
Linear regression with RSS/L2/qudratic loss is convex

GD will find the global minimum of a convex function.

For non-convex functions, it might only find the local minimum.

Louis' Notes

Explorer

Gradient descent

Understanding stepsize $μ$

How many minima can a function have?

Graph View

Table of Contents

Backlinks

Louis' Notes

Explorer

Gradient descent

Understanding stepsize μ

How many minima can a function have?

Graph View

Table of Contents

Backlinks

Understanding stepsize $μ$