Week 1: Weight Initialization - Effect on Activations vs Gradients

Hi, @aman_kumar.

When computingimage, if you unroll \delta^l (dZ) you get:
image (source)

Now the reasoning is analogous to the one shown in the lecture for the activations.

If the gradients explode and you get NaN or inf values, you may have to restart training. If they vanish, parameter updates will be very small. Generally speaking, I don’t think the problem tends to solve itself.

This was implemented in course 1, in case you want to run your own experiments :slight_smile:

You may also find this interesting.