In terms of mini batch gradient descent, here is my understanding about involvement of exponential weighted average when beta=0.9. Is it correct sir ?
On first epoch, while during 15th iteration of mini-batch, we are basically average over last 10 iteration from 5th - 15th iteration of mini-batches gradients then update the weights ? Am i correct sir ?
Hi @Anbu,
Apologies for the delayed response. The fact that in exponentially weighted moving averages, it takes into account the last (1 / 1 - beta) observations into account is an approximation. So, based on that approximation, your understanding is correct. Also, I am assuming that when you refer to average from 5th - 15th, you are referring to exponentially weighted moving average, and not a simple average.
Also, I would like to add, as you might have seen in the lecture videos, Exponentially Weighted Moving Averages in Mini-batch gradient descent, is what we learnt in Gradient Descent with Momentum. I hope this helps.