Inside the function, intitialize_parameters(), I have initialized W1,b1,W2,b2 correctly. Still it is giving me error.

Please clarify.

Thanks!

Inside the function, intitialize_parameters(), I have initialized W1,b1,W2,b2 correctly. Still it is giving me error.

Please clarify.

Thanks!

I have the same issue. I noticed that the rand function from numpy.random only give random number between 0 and 1, so if scaled to 0.01, it can only give numbers between 0 and 0.01, but in the â€śExpected resultsâ€ť, we can observe negatives values (ex: -0.0041675) and value greater than 0.01 (ex: 0.01640271). So I doubt that there are a technical issue here. Probably the range is set differently?

1 Like

Try randn

W1 = np.random.randn(n_h, n_x) *0.01

3 Likes

Yes, @Nikitha has the answer. The instructions are quite clear on this: they literally wrote out the correct code for you using â€śrandnâ€ť. If you look up the two functions, youâ€™ll find that â€śrandâ€ť is the Uniform distribution on (0,1). â€śrandnâ€ť gives you a Normal Distribution (Gaussian) with mean of 0 and standard deviation of 1, so it gives both positive and negative values with absolute value mostly < 3. So using a different distribution gives you different values.

3 Likes

Regarding the code to initialize the weight matrices, I hope I did not miss anything, but is there any particular reason why we need to multiply the np.random output with 0.01?

Yes! It turns out that there is some advantage to starting with relatively small values of the initial weights. If you use larger values, you can have problems with â€śsaturatingâ€ť the values of the sigmoid function so that they come out to be exactly 1 or exactly 0. Of course mathematically they are never exactly 0 or 1, but we are dealing with the pathetic limitations of the finite floating point representations here. If you get 1 as the \hat{y} value, then you end up taking the logarithm of 0 and getting *Inf* or *NaN* as the cost value.

I think Prof Ng must say something about that in the lectures here in Course 1, but I forget exactly what he says on that point. Of course Iâ€™m sure you picked up on the fact that we canâ€™t just use 0 as the initial values, because we need to â€śbreak symmetryâ€ť. Prof Ng does mention that in the lectures, but doesnâ€™t really prove it. Hereâ€™s a thread which discusses why symmetry breaking is required in more detail.

It also turns out that there are more sophisticated ways to do initialization than just multiplying by 0.01. We will learn about techniques like Xavier and He Initialization in Course 2 of this series, so please â€śhold that thoughtâ€ť and stay tuned for Course 2.

3 Likes