Momentum descent and RMS Prop

Hi Sir,

@paulinpaloalto

  1. In RMS prop video lecture proff told that large learning rate can be used without diverging in the vertical direction. If so the statement true for RMS_Prop means, why the same statement not told for Gradient descent Momentum lecture ? Is it not applicable for momentum ?

  2. From the implementation notes of EWA lecture, we need the below code to compute average over last 10 days temperature

shot1

From the Gradient descent momentum lecture, in the implementation details slide, VDW with beta =0.9 can compute average over last 10 iteration gradients. If so the statement, Why the above same code (repeat get next theta t ) not used or specified here ?

  1. In the above pic, On iteration t means what does it meaning ?

Sorry If im asking you lot of quesions sir. I think u can help me to clarify.

Hi, @Anbu.

I’m sorry we missed this topic :sweat:

  1. Momentum with a large learning rate may overshoot the minimum. You can use this simulator to compare the behaviors of RMSprop and Momentum with different learning rates.
  2. It’s the same pseudocode! "On iteration t" is equivalent to “Repeat”, and “Compute dW,dB” is equivalent to "Get next \theta_t".
  3. An iteration is just a gradient descent step. Think of "On iteration t" as a loop:
for t in range(num_minibatches):
    dW, db = ...
    v_dW = ...
    v_db = ...
    W = ...

Good luck with the specialization :slight_smile: