Week 1 Gradient checking problem

Hi everyone, I just met a problem about the assignment. This is the error. And I don’t know how to fix it.

I assume this is in the case that you have already removed the intentional errors in the back prop code.

Well, this is a fairly complex algorithm. In terms of how to debug this, I added some print statements in my code to show some of the intermediate values and here’s what I get (also in the case where I fixed the fake errors in back prop):

num_parameters 47
39: gradapprox[i] = [0.], grad[i] = [0.]
40: gradapprox[i] = [0.], grad[i] = [0.]
41: gradapprox[i] = [0.], grad[i] = [0.]
42: gradapprox[i] = [0.19763344], grad[i] = [0.19763343]
43: gradapprox[i] = [0.], grad[i] = [0.]
44: gradapprox[i] = [0.], grad[i] = [0.]
45: gradapprox[i] = [2.24404227], grad[i] = [2.24404238]
46: gradapprox[i] = [0.21225742], grad[i] = [0.21225753]
numerator = 8.050575492696896e-07
norm(grad) = 3.3851797873981373
norm(gradapprox) = 3.3851796259558395
denominator = 6.770359413353977
difference = 1.1890913024229996e-07
Your backward propagation works perfectly fine! difference = 1.1890913024229996e-07

I used this logic to print out only the last 8 of the individual gradapprox[i] values:

if i >= num_parameters - 8:
        print(f"{i}: gradapprox[i] = {gradapprox[i]}, grad[i] = {grad[i]}")

Hi Paul, I just used the code you provide to check the grad[i] and gradapprox[i]. My value is the same as you when num_parameters = 46. So there’s something wrong with the computation of difference. But I used the np.linalg.norm. It should not be any problem.

Are you computing the difference using the whole 47 entry vectors?

But in any case, it’s probably time to just look at your code. We aren’t supposed to do that on a public thread, but we can do it using Direct Messages. Please check your DMs for a message from me about how to do that.

Just to close the loop on the public thread, the problem was what I suggested in my last reply: the final difference was being computing using only the last entry in the grad and gradapprox vectors instead of using the whole vectors.