I was able to successfuly modify the function with the intended error, thanks to the big hint
But… what if no one gives me any hint? What is the proper way to find the bug? I listened to the lecture, but still, I would need explicit advice as to how to compare specific values of W’s and b’s between grad and gradapprox inside the layers?
It’s a good question, but unfortunately I don’t think there’s any way to derive any actual information from the results of gradient checking in the case that it shows an error. It just says that something is wrong with your algorithm and it’s then up to you to find the bugs. In the “toy” example here, they were pretty obvious, but in a real world scenario who knows? Maybe the best you can hope for is that if the error is big, it means the problem is really fundamental or structural, as opposed to something subtle (off by one error or the like).
But the good news in all of this is that in the “real world” nobody builds their own back prop algorithms anymore: you just use your DL Framework of Choice. For us here in the DLS series, that will be TensorFlow from Google. We’ll start learning about that in Week 3 of Course 2. There are other frameworks besides TF like PyTorch as well. For solving real problems these days, the state of the art is pretty deep and it’s way too much work to build the algorithms yourself. Prof Ng still shows us how to build the basic algorithms in python because there is an important pedagogical point: knowing what is actually happening “under the covers” gives us useful intuition about how to debug and tune systems when they don’t work as well as we want. If you treat everything as a “black box”, then those intuitions are harder to generate.
Wow that answer was very inspiring and informative in a number of ways. Thank you! I am looking forward to getting to week 3! I am really enjoying the course so far.
If I remember correctly, prof. Ng. talked about noticing big differences of weights/ biases in a specific layer, and that is where we need to carefully check the code. Because, suppose I have 20 layers in the future, the problems might be less obvious to find.
Do those frameworks make the grad checking automatically upon our request, so that we don’t need to worry about it?
Yes, the networks we build here start pretty small, but things get pretty big pretty fast. We’ll see networks with hundreds of layers in C4.
Once you switch to using a Framework, you don’t really need gradient checking, because the whole point of using TF or PyTorch is that you can assume that the code is correct: somebody else did the work of writing the code and debugging and tuning it. Of course that doesn’t mean that the network architecture and other hyperparameters you have chosen are correct to give the best performance. So you still have plenty of things to worry about, but code correctness for the underlying algorithms is no longer one of them.