Hi @Maxwell_Shapiro, I was also intrigued by the bias correction. Let me share what I found.
Let’s assume the data sequence: \theta_1, \theta_2, ..., \theta_t.
The Exponentially Weighted Average (EWA) is defined as follows:
EWA_\beta(\theta_1,...,\theta_t) = \frac{(\beta^{t-1}\theta_1 + \beta^{t-2}\theta_2 + ... + \beta^0\theta_t)}{\beta^{t-1}+\beta^{t-2}+...+ \beta^0}, 0 < \beta < 1
The denominator is a geometric series and we can simplify the equation:
EWA_\beta(\theta_1,...,\theta_t) = \frac{(\beta^{t-1}\theta_1 + \beta^{t-2}\theta_2 + ... + \beta^0\theta_t)}{\frac{1-\beta^t}{1-\beta}}
and
EWA_\beta(\theta_1,...,\theta_t) = \frac{(1-\beta)(\beta^{t-1}\theta_1 + \beta^{t-2}\theta_2 + ... + \beta^0\theta_t)}{1-\beta^t}
Note that the numerator is the definition of V_t and the denominator is the bias correction.
In other words, V_t is a very good approximation of EWA when t is large (\beta^t \rightarrow 0). However, when t is small (at the beginning of the series or iterations), we need the bias correction to make V_t the exact definition of EWA.
I hope this helps.