Momentum Updates Confusion

Mohammed_Ifreen · February 8, 2022, 8:43pm

I have a small doubt… Suppose if I have 5 layers in my NN and I am using Gradient Descent with Momentum and Mini Batch…

My understanding is, For each mini batch that’s passed, We update the weights for each layers individually for its weighted average. The key thing which I am having doubt is, We are computing the last 10 (assuming beta=0.9) gradient average for each layer and updating that layers weights. So, when I am in Layer 3, I compute the last 10 gradients of layer 3 weighted average and then subtract it. Is that right ? Or are we mixing this between the layers ?

paulinpaloalto · February 8, 2022, 9:36pm

There is no mixing between layers: the “exponentially weighted average” calculation is being applied at each layer individually. Of course the layers are connected in the “big picture” view and back propagation at a given layer is affected by what happens in the later layers, right? But that’s true with or without momentum.

Topic		Replies	Views
Confused about the momentum updates! Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	618	April 25, 2021
Gradient descent exponential weighted average Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	545	May 11, 2022
Update_parameters_with_momentum Improving Deep Neural Networks: Hyperparameter tun coursera-platform	4	486	June 8, 2023
HELP - Something not clear with momentum gradient decent Improving Deep Neural Networks: Hyperparameter tun coursera-platform	5	404	August 11, 2023
Doubt regarding course 2 Week 2 assignment Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	510	November 27, 2022

Momentum Updates Confusion

Related topics