Question about the exponentially weighted average (Week 2)

Hi everyone,

This is a question about the intuition behind exponentially weighted averages.

From what I understood, the formula for an exponentially weighted average is a very useful tool to model trends by giving more or less weight to older and newer data. To give a full weight (i.e., 1.0) to the old data and consequently no weight to new data, you would have a model that doesn’t change at all (v1 = v2 = v3 = v4 = v5 = … = vn). Similarly, if you give full weight to new data, you’ll have a model that, using this term perhaps a bit liberally, is ‘overfitting’ the data (v1 = theta_1, v2 = theta_2, v3 = theta_3, etc). Finding a good distribution of weight for old and new data is therefore key in obtaining an accurate and general model for your data.

For the purpose of explaining my confusion, I’ll try to stick to the example Andrew went through in his video. I got confused when Andrew mentioned ‘averaging over x days.’ Specifically, the choice of 1/e as this threshold for when the exponential weight, 0.9^n for example, makes the datum ‘on that day’ (so to speak) no longer significant. To me, 1/e seems like an arbitrary choice, and therefore it’s hard to pin down just for how long the data is relevant. Or at least the choice of (1 - epsilon)^(1/epsilon), since this is what is actually used to obtain approximately 1/e, seems arbitrary. I understand how we obtain the number of days from this formula, but what I don’t understand is why this formula actually allows us to obtain a good estimate for the number of days we’re averaging over. I think the best way to phrase my question is: where did this formula come from? Maybe I missed something but there didn’t seem to be much of an explanation as to where this came from.


  • Raphael

Hi, @nash.

As you already suspect, the choice of \frac{1}{e} is slightly arbitrary, but convenient. Here’s a more detailed explanation.

Let me know if that answers your question :slight_smile: