In the last lecture of DLS Course 2 week 2 about the problems of local optima, Andrew argues that local minima are not a problem because they are much rarer than saddle points, and then depicts how gradient descent enters and then exists a saddle point.
The saddle point is exited “because of a random perturbation”. What does that mean?
I could not find anything about this anywhere – besides Why saddle points isn't a problem for gradient descent? and Cost function stuck at local minima - #3 by kenb, which are also helpful but do not answer my question.
Is it the case that the saddle point is not entered precisely, so being an epsilon away from the saddle point is sufficient to “exit” it again after sufficiently many gradient descent iterations? Or maybe even when entering the saddle point exactly simply due to numerical imprecision? Or is imprecision caused on purpose by adding small noise, for the purpose of exiting saddle points?