Hi Sir,
-
In RMS prop video lecture proff told that large learning rate can be used without diverging in the vertical direction. If so the statement true for RMS_Prop means, why the same statement not told for Gradient descent Momentum lecture ? Is it not applicable for momentum ?
-
From the implementation notes of EWA lecture, we need the below code to compute average over last 10 days temperature
From the Gradient descent momentum lecture, in the implementation details slide, VDW with beta =0.9 can compute average over last 10 iteration gradients. If so the statement, Why the above same code (repeat get next theta t ) not used or specified here ?
- In the above pic, On iteration t means what does it meaning ?
Sorry If im asking you lot of quesions sir. I think u can help me to clarify.