[Week 2] Exponential weighted average, Why we roughly approximate the last 1/(1-β) days

wallik2 · June 29, 2022, 6:54am

For β = 0.9

An exponentially weighted average that focus on the last 10 days temperature because after 10 days, the weight decay starts to be less than 1/3 of weight at the current day – Andrew NG

I see that Andrew tells that the power of β starts to get lower than 1/e or roughly 1/3 when the last t day is 1/(1-β)

where e is natural exponent (2.718…)

What I do not understand is that why Andrew ignore the day that start having weight less than 1/e or roughly 1/3

Let me visualize it, if you do not see the picture of my problem

You can see that Andrew NG state that we ROUGHLY average over 1/1-β days (In this case β = 0.9, so we approximate by last 10 days),

Again, my problem is that why we do not take Day t-11 into account for averaging when its weight is lower than 1/e of β ?

(Recall that 1/e = 0.3678…)

I have watched the video *Understanding Exponentially Weighted Averages* repeat and repeat again

I assume that the answer to my question would be explained by this equation
sss

Intuitively, I do not understand how this formula came from, this is my another question

anon57530071 · June 30, 2022, 5:01am

First of all, we may need to recap the equations for “Exponentially Weighted (Mean) Average”, starting from 0. (Andrew started in a reverse order, but, let’s start with a normal order, then, back to his way.)

\begin{equation} v_0 = 0 \ \ \ \ (Actually,\ this\ can\ be\ any\ value.) \\ v_1 = \beta v_0 + (1-\beta) \theta_1 \\ v_2 = \beta v_1 + (1-\beta) \theta_2 \\ v_3 = \beta v_2 + (1-\beta) \theta_3 \\ \end{equation} :\\

From v_2, we can rewrite by using previous equation as follows.

\begin{align} v_2 & = \beta v_1 + (1-\beta)\theta_2 \\ &= \beta (\beta v_0 + (1-\beta)\theta_1) + (1-\beta) \theta_2 \\ &= \beta^2v_0 + \beta(1-\beta)\theta_1 + (1-\beta)\theta_2\\ v_3 & = \beta v_2 + (1-\beta)\theta_3 \\ &= \beta (\beta^2v_0 + \beta(1-\beta)\theta_1 + (1-\beta)\theta_2) + (1-\beta) \theta_3 \\ &= \beta^3v_0 + \beta^2(1-\beta)\theta_1 + \beta(1-\beta)\theta_2 + (1-\beta) \theta_3 \\ &:\\ \end{align}

Then, let’s use \theta_0 as a replacement of v_0 which can be non-zero value, and describe how v_{100} looks like.

v_{100} = \beta^{100}\theta_0 + \beta^{99}(1-\beta)\theta_1 + \beta^{98}(1-\beta)\theta_2 +\ ..\ + \beta(1-\beta)\theta_{99} + (1-\beta)\theta_{100}

Then, Andrew splits this into two vectors. One is for coefficients, and the other is \theta. A vector for coefficients is exactly showing the weights for \theta.
If \beta = 0.9, then, coefficients can be seen as follows.

This corresponds to Andrews 2nd sketch.

The first sketch is a vector for \theta.
Then, Andrew said that roughly 10 days, i.e, 10 coefficients from 100th day in a reverse order, are (weighted) averaged. (Note that this is not “average” actually, but is a summation.)
If you look at the figure, actually, summation of 10 days is not enough to be, like 95% confidence level. (It’s not a matter of 10 days, or 11 days, actually…)
Andrew’s intuition is sometimes not correct from a math view point.
But, from “intuition” view point, Andrew’s talk makes sense if we look at the following figures.

In the case of \beta = 0.2, we only need a few days, and do not need to summing up all past 100 days with weights (coefficients), but in the case of \beta=0.98, we need to use all coefficients, i.e, summing up all 100 days with weights. The days to be considered may not be exactly equal to \frac{1}{1-\beta}.

Regarding your 2nd question, it is one of variation of “Eular’s number” definitions.
Most famous one would be;

\lim_{n \to \infty}(1+\frac{1}{n})^{n} = e

But, this is also one of variations.

\lim_{\epsilon \to 0}(1-\epsilon)^{\frac{1}{\epsilon}} = \frac{1}{e}

Andrew used this, and set \epsilon=0.02. Then, the result is close to \frac{1}{e}.

wallik2 · July 12, 2022, 9:12am

Thank you. Your explanation is very clear

Topic		Replies	Views
Exponentially weighted Average Improving Deep Neural Networks: Hyperparameter tun	7	734	May 17, 2021
Exponentially Weighted Average Understanding Improving Deep Neural Networks: Hyperparameter tun	3	622	June 1, 2021
Question about the exponentially weighted average (Week 2) Improving Deep Neural Networks: Hyperparameter tun	1	652	August 4, 2021
Exponential Weighted Average Clarification Improving Deep Neural Networks: Hyperparameter tun	7	645	July 17, 2021
Gradient descent exponential weighted average Improving Deep Neural Networks: Hyperparameter tun	1	543	May 11, 2022

[Week 2] Exponential weighted average, Why we roughly approximate the last 1/(1-β) days

Related topics