C1W4: Cat Recognizer 2.0 Not Learning

Update 1

  • My parameters init was wrong … instead of multiplying the weights by 0.01 learning was dead, when dividing by sqrt(l_dim[l]) learning magically started working … Why? Strangley, initializing the weights of layers to random * 0.01 was the same as just initializing them to all zeroes …

I have been trying to implement the model from C1W4: Deep Neural Network - Application. This model is essentially a Cat Recognizer 2.0 from earlier in the course. I am having a problem where the network is not learning or computing a smaller and smaller cost. I know for sure its nothing wrong with the data sets or preprocessing, the hyper parameters, or parameters initialization. Nothing seems to be wrong, but I know there must be a tiny little evil bug. I have been trying to debug this for several days now … hehe.

Let me know if this can be resolved. Thanks.

Update 2

  • This was mostly resolved. This is explained in C2 W1 :slight_smile:
1 Like

Right! It turns out that it’s not a bug, but the reality is that the simplistic initialization routine that they taught us in Course 1 doesn’t work very well in all cases. In fact, it works really poorly with the particular cat picture problem in Week 4, so they silently gave us the more sophisticated algorithm Xavier Initialization, but didn’t bother to point that out. I think there are two reasons for that: 1) there’s just too much material to cover all in one course and this was already covered in Course 2 and 2) if they pointed this out, it would also make clear that they have provided us with worked solutions to the exercises in the Step by Step assignment which is the first assignment in Week 4 of C1. So they just decided not to mention it.

Could you explain what this means?
“they silently gave us the more sophisticated algorithm Xavier Initialization, but didn’t bother to point that out.”
Are we not supposed to copy our initialization function (and other functions) from the first assignment of week 4 to the notebook for the second assignment of week 4?

Yes, you are not supposed to copy over your functions from the Step by Step exercise. Where does it tell you to do that? Please study the “import” block at the beginning of the notebook. I think the reason that they aren’t more explicit about that is that it reveals that they have given you the worked solutions to the Step by Step exercise, although they were forced to use a different “init” routine which is the real point of this thread.

Thank you so much for the quick reply. I’ve been stuck for several hours with the same issue mentioned in several posts where my cost doesn’t match the expected. That would explain it. I actually wasn’t sure if we were supposed to copy those functions over. Yes, it doesn’t say to do it, but it also doesn’t say not to do it. Hopefully this is a quick fix now. I’ll reload the notebook and just put in the lines of code I added only (and not the functions from the first assignment).

Yes. That indeed fixed the issue. What got me was this instruction: “Use the helper functions you implemented previously to build an 𝐿-layer neural network with the following structure: [LINEAR → RELU]×(L-1) → LINEAR → SIGMOID.” I interpreted that to mean I needed to copy my helper functions to the current notebook, though I admit I wasn’t completely sure that was what was intended.

I think in a previous week, there was an instruction in one of the assignments that explicitly said that a function was included (maybe imported?). In addition, the 2-layer network worked with my copied over helper functions so I thought at that point, I had done things as intended. I think it would be beneficial to make the instructions clear that the functions are silently there.

I take it you’re saying that someone could then find the implemented functions (I suppose they’re in a folder somewhere in the workspace) then copy them back to the first assignment. I suppose that could happen - I wouldn’t have thought to go the second assignment first and try to find an implemented solution just so I could avoid doing the first assignment on my own. The more likely scenario seems to be what actually happened - getting stuck for several hours. Again, thanks for your help. I think I might have given up otherwise.

OK. I see now. As you intimated, this was just above the import block

dnn_app_utils provides the functions implemented in the “Building your Deep Neural Network: Step by Step” assignment to this notebook.

I missed that (or saw it and forgot by the time I got further down).