OMG i could not understand what Dr.NG sir was explaining regarding the alternate Weighted average formula. So please can anyone explain me whats the difference between them like it does not scale the current gradient as we omitted 1-beta so how it can smoothen out the gradient descent and in second equation as we omitted 1-beta how can weights add up to 1 (then it is not even a proper average right?). Disappointed with these vague explanations
Thanks for sharing the link but i cant extract much from this thread regarding my questions