Initialize Z with zeros or np.random? Weird bug for Week 1 Assignment: Convolutional Model, Step by Step


  • Exercise 3 - conv_forward function implementation

What happened?

I found a bug in my code that resulted in First Test passed but second Test failed, that is if I initialize the output Z through np.random.randn() instead of np.zeros(). I found this bug by checking my code line by line according to the comments instructions within the function. Once I flip it to using np.zeros() both tests pass. However, I actually don’t understand why it matters? To me, every value of Z will be reset by assigning the output from conv_single_step() so the initial value doesn’t matter but only the shape of Z matters, how come this will cause the Test2 to fail? I find it hard to figure it out myself so decided to post here to see if anyone meet the similar problem.

Again it’s not blocking now, but I’m just so curious to know is there anything happening under the hood?
Any help and discussions are highly appreciated!!

This is a really interesting question! I tried your experiment and you’re right: one of the later tests with stride = 1 and pad = 6 fails. At first I was very puzzled: you’re right that the input values of Z all get updated, assuming your implementation is correct. And they aren’t used as input to any other calculations.

But the reason is that calling np.random.randn in the code changes the behavior of the random sequences that you get for the other input values in the test. Every number you generate advances the sequence. The tests all set a particular random “seed” value at the beginning, so that the results will be consistent. But if you add any random calls or make the random calls in a different order, you’ll get a different answer, because the randoms you get end up assigned to different places than they would normally be.

Here’s a way to convince yourself: try initializing Z with np.ones instead of np.zeros. That will prove that the actual Z values don’t matter, but doing it that way does not disturb the behavior of the random calls in the test code and everything passes.

Thanks very much for this question! It’s the most enjoyable question to research that I’ve seen in a long time. A really nice puzzle and I learned something fun in the process. :nerd_face:


Thank you so much @paulinpaloalto for fast response and detailed explanation! Dry run a example to ensure I get your point correctly:
Once after setting the seed at the beginning of the function, the random generator’s output gets ‘locked’, say to be “1, 2, 3, 4, 5 etc” and the test method is written assuming input is “1, 2, 3”. But if I add a extra call to generate 2 numbers in the middle, then the actual computation input become “3, 4, 5” which leads to test breaks.

Is my above understanding correct?
Again appreciations for your help :100:

Yes, your description of why using np.random.randn in your code causes the tests to fail is right. Of course they could have written the tests in such a way that your way of writing the code wouldn’t matter. If you want to go to the next level of detail, have a look at the test code in and you’ll see why it only failed in the final test of about 5 tests: for most of the tests, they set the random seed, initialize the random input values and then call the “function under test”. But in that final test case, they call the function and then initialize some more new parameters and call another test case without resetting the seed. That’s the case that fails, because now the random inputs are different than they expect because of the fact that your code “used up” part of the random sequence.

1 Like