Inside the function, intitialize_parameters(), I have initialized W1,b1,W2,b2 correctly. Still it is giving me error.
Please clarify.
Thanks!
Inside the function, intitialize_parameters(), I have initialized W1,b1,W2,b2 correctly. Still it is giving me error.
Please clarify.
Thanks!
I have the same issue. I noticed that the rand function from numpy.random only give random number between 0 and 1, so if scaled to 0.01, it can only give numbers between 0 and 0.01, but in the “Expected results”, we can observe negatives values (ex: -0.0041675) and value greater than 0.01 (ex: 0.01640271). So I doubt that there are a technical issue here. Probably the range is set differently?
Try randn
W1 = np.random.randn(n_h, n_x) *0.01
Yes, @Nikitha has the answer. The instructions are quite clear on this: they literally wrote out the correct code for you using “randn”. If you look up the two functions, you’ll find that “rand” is the Uniform distribution on (0,1). “randn” gives you a Normal Distribution (Gaussian) with mean of 0 and standard deviation of 1, so it gives both positive and negative values with absolute value mostly < 3. So using a different distribution gives you different values.
Regarding the code to initialize the weight matrices, I hope I did not miss anything, but is there any particular reason why we need to multiply the np.random output with 0.01?
Yes! It turns out that there is some advantage to starting with relatively small values of the initial weights. If you use larger values, you can have problems with “saturating” the values of the sigmoid function so that they come out to be exactly 1 or exactly 0. Of course mathematically they are never exactly 0 or 1, but we are dealing with the pathetic limitations of the finite floating point representations here. If you get 1 as the \hat{y} value, then you end up taking the logarithm of 0 and getting Inf or NaN as the cost value.
I think Prof Ng must say something about that in the lectures here in Course 1, but I forget exactly what he says on that point. Of course I’m sure you picked up on the fact that we can’t just use 0 as the initial values, because we need to “break symmetry”. Prof Ng does mention that in the lectures, but doesn’t really prove it. Here’s a thread which discusses why symmetry breaking is required in more detail.
It also turns out that there are more sophisticated ways to do initialization than just multiplying by 0.01. We will learn about techniques like Xavier and He Initialization in Course 2 of this series, so please “hold that thought” and stay tuned for Course 2.