Implementing exponentially weighted averages

Rodolfo_Novarini · April 5, 2023, 3:56am

I’d appreciate if someone can help understand why the implementation of exponentially weighted averages uses VdW = BVdW + (1 -B) * dW (as in the right side of this screenshot) and not VdW = BVd(W-1) + (1 -B) * dW (as in the left side of this screenshot.

There was a similar question from @hazingo a few months ago (see Momentum Gradient Descent question) but I was not convinced by (or did not understand) the answer he got.

Thanks,

balaji.ambresh · April 5, 2023, 9:15am

v_t = \beta v_{t - 1} + (1 - \beta) \theta_t when t > 0, and 0 otherwise.
Here, \theta_t is a quantity computed in the current timestep.

Similarly,

dW is computed based on the current mini-batch and
v_{dW} is the value so far i.e. excluding the current mini-batch.

The update equation for v_{dW} after processing the current mini-batch becomes:
v_{dW} = \beta v_{dW} + (1 - \beta) dW

Rodolfo_Novarini · April 5, 2023, 4:37pm

Hi @balaji.ambresh, thanks for your reply. What you explain here is exactly what I thought initially but then in the ‘Optimization Methods’ assignment, ‘update_parameters_with_momentum’ function, I’ve got all tests passed using:

for l in range(1, L + 1):
v[“dW” + str(l)] = … * v[“dW” + str(l)] + … (including the minimum possible to adhere to the rules while being able to explain myself)

In this case I am using vdW1 = … * vdW1 + …

I believe that here the existing vdW1 used as input includes the current mini-batch, isn’t this the case?

balaji.ambresh · April 5, 2023, 5:46pm

Your understanding of exponential smoothing is correct.

But. there’s a difference between timestep and layer number. I recommend going moving forward with rest of the exercises, all the way upto and including def model. It’ll help reinforce the following facts:

Gradients are computed based on the current mini-batch
How difference in time comes into play for v is on the right side (past) vs left side (current) of the equations.

Topic		Replies	Views
Confused on Exponentially Weighted Average Videos Improving Deep Neural Networks: Hyperparameter tun	4	416	August 19, 2023
Momentum Gradient Descent question Improving Deep Neural Networks: Hyperparameter tun	5	616	December 23, 2022
Doubt regarding course 2 Week 2 assignment Improving Deep Neural Networks: Hyperparameter tun	2	503	November 27, 2022
Gradient Descent with Momentum (formula) Improving Deep Neural Networks: Hyperparameter tun	2	521	November 13, 2022
HELP - Something not clear with momentum gradient decent Improving Deep Neural Networks: Hyperparameter tun	5	385	August 11, 2023

Implementing exponentially weighted averages

Related topics