Derivative of Relu in output layer

Yes, that was what I was recommending. You can try LeakyReLU as well, but I think it’s worth trying just omitting the output activation function altogether.

If you implement LeakyReLU, here’s one way to code its derivative:

def leakyreluprime(Z, slope = 0.05):
    G = np.where(Z > 0, 1, slope)
    return G

Of course that is implemented as a separate function call. If you build it “in situ” by analogy to the way relu_backward works, you’re doing two things at once:

dZ = dA * g'(Z)

But the same idea can be adapted …

One point to emphasize here is that if you just duplicate the code in relu_backward to make leaky_relu_backward, be sure to understand the importance of the way they implemented this line:

If you “short-circuit” that by eliminating the “copy” there:

dZ = dA

that is a disaster, because you’re about to overwrite some of the values in dZ. Because of the way that parameter passing works in python and the way object assignments work, doing it without the copy modifies the global value of dA. See this post and this later reply on that thread.