Hi everyone,

This is a question about the intuition behind exponentially weighted averages.

From what I understood, the formula for an exponentially weighted average is a very useful tool to model trends by giving more or less weight to older and newer data. To give a full weight (i.e., 1.0) to the old data and consequently no weight to new data, you would have a model that doesn’t change at all (v1 = v2 = v3 = v4 = v5 = … = vn). Similarly, if you give full weight to new data, you’ll have a model that, using this term perhaps a bit liberally, is ‘overfitting’ the data (v1 = theta_1, v2 = theta_2, v3 = theta_3, etc). Finding a good distribution of weight for old and new data is therefore key in obtaining an accurate and general model for your data.

For the purpose of explaining my confusion, I’ll try to stick to the example Andrew went through in his video. I got confused when Andrew mentioned ‘averaging over x days.’ Specifically, the choice of 1/e as this threshold for when the exponential weight, 0.9^n for example, makes the datum ‘on that day’ (so to speak) no longer significant. To me, 1/e seems like an arbitrary choice, and therefore it’s hard to pin down just for how long the data is relevant. Or at least the choice of (1 - epsilon)^(1/epsilon), since this is what is actually used to obtain approximately 1/e, seems arbitrary. I understand how we obtain the number of days from this formula, but what I don’t understand is why this formula actually allows us to obtain a *good* estimate for the number of days we’re averaging over. I think the best way to phrase my question is: where did this formula come from? Maybe I missed something but there didn’t seem to be much of an explanation as to where this came from.

Thanks!

- Raphael