RMSprop on saddle points

Does RMSprop stops at saddle points?

Because of dW term in weight update formula. At saddle point dW=0 so,
W_{new} = W_{old} - \frac{\alpha (0)}{\sqrt{S_{dW}}}
W_{new} = W_{old}

But for momentum we had moving average(V_d), i.e. W_{new} = W_{old} - \alpha V_d. V_d won’t be 0 even if current gradient is 0.

Is my understanding correct above?

At saddle point, all dW is not 0. Only the gradient with respect to a particular dimension can be zero. For instance, dW1 can be 0, but dW2, dW3, etc. are not, and more importantly, training continues until the best optimum is reached. The best optimum is determined by the overall cost function and not by a single dimension.

1 Like