Is DL/dz for ReLU concise and closed form?

The top vote getting topic is the derivation of DL/dz, which, for a Sigmoid activation leads to the remarkably concise and convenient result: A-Y.

But what if the activation function is relu (or leaky relu or tanh)? The final programming assignment seems to hide the expression (if it exists) inside the function relu_backward, and the week 3 lecture notes leave it unsimplified as (for example dz1):
dz[1] = W[2]T dz[2] * g[1]'(z[1])

If we never actually need this to be simplified in practice, I’d like to know that as well.

@am003e ,

If the activation function is the ReLU, then the concise and convenient A - Y doesn’t apply anymore.

The derivative for the ReLU function with respect to an input ‘z’ is:

dL/dz = { 0 if z <= 0, 1 if z > 0 }

If the loss function used is the binary cross-entropy loss, then the derivative of the loss function with respect to the output of the ReLU unit (dL/dy) would be:

dL/dy = y_pred - y if y > 0
dL/dy = 0 if y <= 0

In practice you will leave all these calculation to the framework, like PyTorch or Tensorflow. In practice you will just do something like:


But it is important to understand what goes behind the scenes. That’s why studying back propagation is worth the time.

1 Like