Why not bias correction on the tail?

I am on Deep Learning specialization, hyper paramaters, …, week-2. Ng teach bias correction by applying Vt / (1 - B^t). As we progress on t index, 1- B^t will approach to one, but for initial t values, it will amplify the result.

My question is that I would also apply similar logic to the tail, as you see, the tail has also large deviation, maybe simple index tricking Vt * (1 - B^{T-t}), where is T is the last data point will do that.

Why do we only apply for the initial portion of data in bias correction?

Thanks,

Any kind of filtering (like the exponential filter) also incurs a time delay. That’s what you’re seeing in the figure, not an affect of the bias correction.

We’re not really going to custom-design a bias correction specific to every individual data set. We’re looking for general tools that would apply to any situation.

2 Likes

Because the thing we are trying to correct for is that the time-series mechanism only works fully once we have a series that is long enough for the \beta value we have chosen. By definition the problem only exists at the beginning, when we don’t yet have sufficient previous data.

3 Likes

Thank you Paul for your answer,

I think that I sticked too much the temperature-day plots. Since you are iterating on mini-batches (t=1…T), it is only problem at the beginning since first few minibatches will be really shocking for parameters (maybe expression is not right, but I mean that instead of million samples average --much smoother and correct–, now you are using a few hunded samples trying to update the paramaters)
thanks,

1 Like