Quention about the initialization of Exponentially Weighted Averages

qiu · April 1, 2022, 6:53am

In the course, the ewa is initialized to v(0)=0, and v(i)=0.9v(i-1)+0.1theta(i). The problem is that the first few v will be very close to zero.

So in later video, there is a bias correction formula: vi/(1-beta**t)

My question is: why not set v(1)=theta(1), and v(i)=0.9*v(i-1)+0.1theta(i).

What’s the difference between this two kind of initialization. Why introduce such a complex correction formula instead of just set v(1)=theta(1)

nramon · April 21, 2022, 9:13am

Hi, @qiu:

Sorry for the late reply

The alternative you propose is perfectly fine, but whether you initialize to 0 or theta(1), the first values of the moving average will be biased towards this constant (this may be acceptable for your use case). The formula from the later video tries to correct this initial bias.

Did that make sense?

Hope you’re enjoying the specialization

Topic		Replies	Views
Exponentially Weighted Averages: Initial Values Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	552	June 23, 2021
Question in Bias Correction for Exponentially Weighted Averages Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	470	October 18, 2023
Bias Correction in Exponentially Weighted Averages Improving Deep Neural Networks: Hyperparameter tun week-module-2 , coursera-platform	4	60	November 29, 2024
Exponentially Weighted Averages, why set V0 as zero and then correct? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	483	September 8, 2022
Gradient Descent with Momentum (formula) Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	522	November 13, 2022

Quention about the initialization of Exponentially Weighted Averages

Related topics