Momentum Updates Confusion

I have a small doubt… Suppose if I have 5 layers in my NN and I am using Gradient Descent with Momentum and Mini Batch…

My understanding is, For each mini batch that’s passed, We update the weights for each layers individually for its weighted average. The key thing which I am having doubt is, We are computing the last 10 (assuming beta=0.9) gradient average for each layer and updating that layers weights. So, when I am in Layer 3, I compute the last 10 gradients of layer 3 weighted average and then subtract it. Is that right ? Or are we mixing this between the layers ?

There is no mixing between layers: the “exponentially weighted average” calculation is being applied at each layer individually. Of course the layers are connected in the “big picture” view and back propagation at a given layer is affected by what happens in the later layers, right? But that’s true with or without momentum.