Explaining how gradients are propagated through deep networks

malcolm.lett · August 31, 2024, 6:23am

When I went through this course and as I experimented with training different networks and had problems here and there, I started to wonder more about what factors go into the gradients of the weights at any given layer, and how those factors can help to explain problems. In particular, I wanted to understand that in the context of deep neural networks.

After a lot of work, and many failed attempts with the math, I’ve finally been able to write that up.

I wanted to share that here in case it is useful to anyone else:

To summarise, I found that the gradients of the weights at any layer are influenced by:

the input data, X
the mean prediction error, (Ŷ — Y)/n
the weights of all layers except the target layer (the weights of the target layer do have some effect, but it’s only indirect).
the pattern of unit activations at every layer including the target layer
the biases of all earlier layers, but not of the target layer or later layers

Additionally, of those influences:

they each have (the potential for) equal effect relative to the others (though layer-to-layer differences in the various attenuation/vanishing/explosion effects can shift this)
the weights have a linear component plus a non-linear component that attenuates the gradients (never amplifies them) in proportion to the percentage of inactive units across the network
the mean values of the weights can have a strong vanishing or exploding effect to the gradients if it is either far from 1.0 or if there are many layers.

Lot’s more in the blog post. Let me know if I’ve messed anything up.

TMosh · August 31, 2024, 7:40am

Thanks for your work on this.

nadtriana · August 31, 2024, 11:33am

Thanks for sharing!

Topic		Replies	Views
Understanding Weight Propagation in Deep Networks and Its Effect on Gradients Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	1	37	March 3, 2025
So, what is vanishing/exploding gradient? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	954	August 19, 2023
Question on weight initialization and exploding/vanishing gradients Improving Deep Neural Networks: Hyperparameter tun coursera-platform	9	697	May 23, 2021
Week 1: Weight Initialization - Effect on Activations vs Gradients Improving Deep Neural Networks: Hyperparameter tun coursera-platform	1	661	June 23, 2021
Vanishing gradient problem Neural Networks and Deep Learning coursera-platform	10	590	December 4, 2023

Explaining how gradients are propagated through deep networks

Related topics