Derivative of Relu in output layer

Eeeek! Sorry, I wasn’t thinking hard enough when I wrote the first response. Your no_relu_backward is not correct. Remember that what we are implementing there is:

dZ = dA * g'(Z)

Meaning that we’re not just returning the derivative of the activation function. So the fact that the derivative is 1 does not mean that the return value is 1, right? It means it’s equal to dA.

Actually while you’re at it, I’d feel more comfortable if you did the assignment of

A = Z

in no_relu with a method that produced a separate copy. They way you implemented it, A ends up being another reference to the same global object. I can’t think of a case in which the return value A is going to get modified, so it’s probably no harm done. But it just introduces some risk of later unpleasant surprises. There’s a reason why they did that np.array(..., copy = True) there in the original code you are copying. Please see this post for more information about how object references work.