# Course 1 week 4, assignment 1, exercise 8: linear activation backward

hello deeplearning team, I’m having trouble getting this function to work correctly.

the function has this comment:

``````"""
Implement the backward propagation for the LINEAR->ACTIVATION layer.

Arguments:
dA -- post-activation gradient for current layer l
cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"

Returns:
dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
dW -- Gradient of the cost with respect to W (current layer l), same shape as W
db -- Gradient of the cost with respect to b (current layer l), same shape as b
"""
``````

and inside it two possible paths, or `activation = "relu"` or `activation = "sigmoid"`
for each one of this paths we’re given two activation functions, `sigmoid_backward()` and `relu_backward()`.

Here is what i don’t understand. for the `sigmoid_backward()` function when I inspect the source code (using the inspect package) it shows the formula being:

``````def sigmoid_backward(dA, cache):
"""
Implement the backward propagation for a single SIGMOID unit.

Arguments:
dA -- post-activation gradient, of any shape
cache -- 'Z' where we store for computing backward propagation efficiently

Returns:
dZ -- Gradient of the cost with respect to Z
"""

Z = cache

s = 1/(1+np.exp(-Z))
dZ = dA * s * (1-s)

assert (dZ.shape == Z.shape)

return dZ

``````

my understanding was that `dZ` when calculating over the sigmoid activation layer is supposed to be `dZ = a - y` but the calculation in the function is completely different.

What changed?

Yes, for the final layer that is true for the sigmoid function. Note that you pass dA to this function. With dA = (A - Y) / (A * (1 - A)), where A = s in this case, you will see that you reach the same formula.

1 Like

not sure I understand. I went back to check the video on understanding backward propagation on week 3 and it states:

``````dz = a - y
``````

because

``````dz = da . g'(z)
``````

The point is that the formulas you show take into account the special value of dA at the output layer. The point is that you could use sigmoid as an activation function in any of the hidden layers as well and that is what sigmoid_backward is written for. If you invoke it for the output layer with the dA as it happens to be in that case, you end up with the same result you show, but the point is this is the fully general version.

1 Like

@aari
As @jonaslalin mentioned that dA = (A - Y) / (A * (1 - A)), where A = s.