[Week 2] Exponential weighted average, Why we roughly approximate the last 1/(1-β) days

For β = 0.9

An exponentially weighted average that focus on the last 10 days temperature because after 10 days, the weight decay starts to be less than 1/3 of weight at the current day – Andrew NG

I see that Andrew tells that the power of β starts to get lower than 1/e or roughly 1/3 when the last t day is 1/(1-β)

where e is natural exponent (2.718…)




What I do not understand is that why Andrew ignore the day that start having weight less than 1/e or roughly 1/3

Let me visualize it, if you do not see the picture of my problem

You can see that Andrew NG state that we ROUGHLY average over 1/1-β days (In this case β = 0.9, so we approximate by last 10 days),

Again, my problem is that why we do not take Day t-11 into account for averaging when its weight is lower than 1/e of β ?

(Recall that 1/e = 0.3678…)


I have watched the video *Understanding Exponentially Weighted Averages* repeat and repeat again

I assume that the answer to my question would be explained by this equation
sss

Intuitively, I do not understand how this formula came from, this is my another question

First of all, we may need to recap the equations for “Exponentially Weighted (Mean) Average”, starting from 0. (Andrew started in a reverse order, but, let’s start with a normal order, then, back to his way.)

\begin{equation} v_0 = 0 \ \ \ \ (Actually,\ this\ can\ be\ any\ value.) \\ v_1 = \beta v_0 + (1-\beta) \theta_1 \\ v_2 = \beta v_1 + (1-\beta) \theta_2 \\ v_3 = \beta v_2 + (1-\beta) \theta_3 \\ \end{equation} :\\

From v_2, we can rewrite by using previous equation as follows.

\begin{align} v_2 & = \beta v_1 + (1-\beta)\theta_2 \\ &= \beta (\beta v_0 + (1-\beta)\theta_1) + (1-\beta) \theta_2 \\ &= \beta^2v_0 + \beta(1-\beta)\theta_1 + (1-\beta)\theta_2\\ v_3 & = \beta v_2 + (1-\beta)\theta_3 \\ &= \beta (\beta^2v_0 + \beta(1-\beta)\theta_1 + (1-\beta)\theta_2) + (1-\beta) \theta_3 \\ &= \beta^3v_0 + \beta^2(1-\beta)\theta_1 + \beta(1-\beta)\theta_2 + (1-\beta) \theta_3 \\ &:\\ \end{align}

Then, let’s use \theta_0 as a replacement of v_0 which can be non-zero value, and describe how v_{100} looks like.

v_{100} = \beta^{100}\theta_0 + \beta^{99}(1-\beta)\theta_1 + \beta^{98}(1-\beta)\theta_2 +\ ..\ + \beta(1-\beta)\theta_{99} + (1-\beta)\theta_{100}

Then, Andrew splits this into two vectors. One is for coefficients, and the other is \theta. A vector for coefficients is exactly showing the weights for \theta.
If \beta = 0.9, then, coefficients can be seen as follows.

This corresponds to Andrews 2nd sketch.

The first sketch is a vector for \theta.
Then, Andrew said that roughly 10 days, i.e, 10 coefficients from 100th day in a reverse order, are (weighted) averaged. (Note that this is not “average” actually, but is a summation.)
If you look at the figure, actually, summation of 10 days is not enough to be, like 95% confidence level. (It’s not a matter of 10 days, or 11 days, actually…)
Andrew’s intuition is sometimes not correct from a math view point. :disappointed_relieved:
But, from “intuition” view point, Andrew’s talk makes sense if we look at the following figures.

In the case of \beta = 0.2, we only need a few days, and do not need to summing up all past 100 days with weights (coefficients), but in the case of \beta=0.98, we need to use all coefficients, i.e, summing up all 100 days with weights. The days to be considered may not be exactly equal to \frac{1}{1-\beta}.

Regarding your 2nd question, it is one of variation of “Eular’s number” definitions.
Most famous one would be;

\lim_{n \to \infty}(1+\frac{1}{n})^{n} = e

But, this is also one of variations.

\lim_{\epsilon \to 0}(1-\epsilon)^{\frac{1}{\epsilon}} = \frac{1}{e}

Andrew used this, and set \epsilon=0.02. Then, the result is close to \frac{1}{e}.

Thank you. Your explanation is very clear