Week 1: Weight Initialization - Effect on Activations vs Gradients

nramon · June 23, 2021, 12:53pm

When computing, if you unroll \delta^l (dZ) you get:
(source)

Now the reasoning is analogous to the one shown in the lecture for the activations.

If the gradients explode and you get NaN or inf values, you may have to restart training. If they vanish, parameter updates will be very small. Generally speaking, I don’t think the problem tends to solve itself.

This was implemented in course 1, in case you want to run your own experiments

You may also find this interesting.

Topic		Replies	Views
Question on weight initialization and exploding/vanishing gradients Improving Deep Neural Networks: Hyperparameter tun coursera-platform	9	697	May 23, 2021
Vanishing / Exploding Gradients Improving Deep Neural Networks: Hyperparameter tun week-module-1 , coursera-platform	5	566	January 11, 2024
Vanishing/Exploding Activations Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	602	October 31, 2021
So, what is vanishing/exploding gradient? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	7	955	August 19, 2023
Vanishing / Exploding Gradients : week1 Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	575	June 19, 2021

Week 1: Weight Initialization - Effect on Activations vs Gradients

Related topics