The initialization seems to have an effect on calculating the gradients. Using np.random.randn() or np.random.randn()*0.01 produce different results. (np.zeros() results in all zero gradients). I was wondering if I can get some insight into this outcome?

I am not able to pass the test as my gradient values do not match (their shapes do) as of now, but I’d like to rule out the initialization, if possible, while debugging.

Thank you in advance.

Are you talking about the Step by Step assignment or the second assignment in Week 4. If the latter, please note that you are not supposed to just copy over your functions from the Step by Step exercise.

You’re right that initialization matters. There are a number of algorithms for initialization and we’ll learn a lot more about that in Course 2 of this series, so “stay tuned” for that.

Also note that if it is the `L_model_backward`

function in Step by Step that is failing for you, you are not supposed to be calling any random functions in `L_model_backward`

. They give you the formula for computing `dAL`

in the instructions, right?

Hello @lepistes, welcome to our community!

I agree with @paulinpaloalto that we should leave room for Course 2. Moreover, I would also like to suggest an idea about the effect of scaling the weights by 0.01: it drives the resulting z value to a smaller range (also scaled down by 0.01 and thus more concentrated to the center of the activation function g (Note that a = g(z)). For the benefit of it, I would leave it to you to experiment on some simple NNs created by you or in an assignment.

Raymond

Hello @lepistes,

I just found this. You may want to watch C1 W3 Video Random Initialization starting from ~4:50 regarding the 0.01 scaling. Also C2 W1 Video: Weight Initialization for Deep Networks.

Cheers,

Raymond