Understanding Weight Propagation in Deep Networks and Its Effect on Gradients

Ezaan_Amin · March 3, 2025, 5:58pm

Q: When calculating weight propagation in deep neural networks, I found that weights get squared as they pass through layers. For example, if the weight matrix is 1.5 along the diagonal, after two layers, the activations involve 1.521.5^21.52. Is this correct in the context of vanishing and exploding gradients?

paulinpaloalto · March 3, 2025, 7:24pm

Are you talking about the weights or the gradients of the weights? The two cases are different.

Note that weights are real numbers, meaning they can be both positive and negative. So just because their absolute values are > 1, that doesn’t mean that the values will accumulate. We are doing linear combinations followed by non-linear activation functions at each layer. The behavior is also affected by the choice of activation function, of course. tanh or sigmoid will have absolute values < 1, but ReLU will not necessarily have that.

The gradients are what they need to be to push the weights in a direction that gives a lower cost value. But when we compute gradients, we are doing the Chain Rule all the way back from the final J value at the output layer, so we are multiplying the gradients at each layer when we compute the gradients for the weight and bias values for the earlier layers in a deep network. That’s where the problems with vanishing and exploding gradients can arise. If you multiply numbers with absolute value > 1, they get bigger in absolute value. Of course when you multiply numbers with absolute value < 1, the absolute values get smaller.

Topic		Replies	Views
Explaining how gradients are propagated through deep networks Neural Networks and Deep Learning week-4 , ai-discussions	2	23	August 31, 2024
Question on weight initialization and exploding/vanishing gradients Improving Deep Neural Networks: Hyperparameter tun	9	673	May 23, 2021
[Help]Vanishing/Exploding Solution Improving Deep Neural Networks: Hyperparameter tun	1	514	April 25, 2022
Week 1: Weight Initialization - Effect on Activations vs Gradients Improving Deep Neural Networks: Hyperparameter tun	1	652	June 23, 2021
Weight Initialization for Deep Networks Improving Deep Neural Networks: Hyperparameter tun	6	634	July 1, 2021

Understanding Weight Propagation in Deep Networks and Its Effect on Gradients

Related topics