I was wondering what are the main advantages of using exponentially weighted average as it gives larger weight to later values. Is it mainly because exponentially weighted average is computationally cheap to approximate an “average” of the data? Thanks!

I think there is not that much difference in computational cost between a full “mean” of the data versus the exponentially weighted average. The point of the EWA is that it allows you to tune how much of the “history” has influence on the current value. The idea is that the EWA is useful when you primarily care about what has been happening “lately” as opposed to since the beginning of time. By selecting the \alpha value you get to define what “lately” means to you.

Thanks. That makes sense. Does that mean that if we do not care about “history”, then we would use a typical algebraic mean instead of EWA?

Well maybe I’m just misunderstanding the way you said it, but I would say it is the opposite of that: the point is that EWA is useful in cases in which all you care about is *recent* behavior. In cases in which you care about the complete history since the start of your algorithm, you can just use the algebraic mean. Note that it’s easy to compute the running average if you know the m value, right?

\mu_{m+1} = \displaystyle \frac {m * \mu_{m} + y_{m+1}}{m + 1}