Gradient descent with Momentum New

HI Mentor,

If Beta is say very large close to 1, then we are overly smooth out the updates if so what will happen interms of optimization ? can you please help to let know ?

Hello @Anbu ,
Thanks for asking your question on the Discourse community. I am a mentor. I will do my best on my reply to give an answer to your question.

Beta is a parameter used in optimization algorithms such as Adam optimizer, which affects how much the optimizer remembers its previous movements. If beta is very large and close to 1, the optimizer will overly smooth out the updates, which can lead to slower learning. In terms of optimization, this means that the algorithm will take longer to converge to the optimal solution, and may even get stuck in a suboptimal solution.

I hope my answer resolves your question. Please feel free to ask a followup question if you have additional questions related to beta.