Both are actually reducing the Oscillations and speeding up the learning. The main motto is same except the way they implement it.

Please correct me If I am wrong.

Thanks

Ajay

Hi, It is true that both are reducing oscillations and speeding up learning but there are some interesting differences.

Aside from the implementation, the concept behind their success is different too. Gradient Descent with momentum is solely based on the principle of reducing oscillation by smoothing out db using exponentially weighted averages.

RMS prop on the other hand is an adaptive learning algorithm which means it has a variable learning rate, unlike Momentum which had a constant one. In RMS prop, as the value of the moving average increases(think of it as getting closer to the minimum in simpler terms), the learning rate gets smaller allowing us to be more precise in our convergence route.

In Momentum, It is smoothing out both db and dw

In RMS Prop, It is computing the squares of the derivatives. The parameters which are causing the oscilations big, those parameters are reduced to a greater extent

Please correct me If I am wrong

Slightly off. Momentum is smoothing out both db and dw, absolutely right. It’s doing it by taking averages which smooths descent.

In RMS Prop, the smoothing effect is achieved by slowly reducing the learning rate. You can try looking at the formula like this:

b=b-(alpha/sqrt(Sdb))*db

Compare this to without rms prop:

b=b-(alpha)*db

Hence, we see that in RMS Prop, the learning rate is divided by some term, which makes it smaller, and as we approach minimum, the learning rate keeps getting smaller which gives a smoothing effect.

Both RMS Prop and Momentum smooth descent. But different reasons. Momentum does it by averages, RMS prop does it by slowly reducing the learning rate.

In the next lecture, you will learn about adam which combines both these effects i.e exponentially weighted average+slowly decreasing learning rate.

Anytime! Enjoy the course