The exercise 8 test expects a value of 0.69314718, but my code returns 0.63719763, which means my code is bugged somewhere. However, this same code passes the exercise 5 test, meaning the bug is probably somewhere other than the cost calculation itself. My guess is somewhere between calling the optimize() function and reaching the propagate() function a few calls later. I’m completely stumped as to what it could be. I’m still figuring out numpy and the like.
This is in the Week 2 programming assignment for the cat classifier.
Exactly. The point is that your lower level routines like propagate, optimize and predict are probably correct, since they passed their unit tests. So that means the bug is in your model logic and how you use those lower level functions. You can get bad results from a good subroutine if you call it incorrectly or don’t handle the return values correctly.
What is the actual error output that you see when the test case fails?
The other thing to check is that you did not hard-code the learning rate on the call to optimize or not pass it (which is a different form of hard-coding). I’m guessing Saif’s suggestion is more likely, since the number of iterations looks correct (there’s an earlier test for that). The usual pattern of errors with hard-coding is that both the number of iterations and learning rate are hard-coded. But still worth a look …
That means there is something wrong in your model logic. Your lower level functions may well be correct, but you have to call them correctly. It’s probably time to just look at your code, but we can’t do that here on the public thread. Please check your DMs for a message from me about how to proceed.
Just to close the loop on the public thread here, the problem was that in the model function, the logic for initializing w and b before the call to optimize was “hand-coded” instead of just calling the initialize_with_zeros function. That would still have worked, if the code had been written in the same way, but the new “hand written” version used the Gaussian distribution to randomly initialize w. That is probably still a valid solution, but you end up with a different answer because you are starting Gradient Descent in a different place. Who knows, your answer might even be better, but the test cases here are not checking for “as good or better”: they are just looking for the values they expect.