Course 1 week 4, assignment 1, exercise 8: linear activation backward

hello deeplearning team, I’m having trouble getting this function to work correctly.

the function has this comment:

"""
    Implement the backward propagation for the LINEAR->ACTIVATION layer.
    
    Arguments:
    dA -- post-activation gradient for current layer l 
    cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
    
    Returns:
    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
    dW -- Gradient of the cost with respect to W (current layer l), same shape as W
    db -- Gradient of the cost with respect to b (current layer l), same shape as b
    """

and inside it two possible paths, or activation = "relu" or activation = "sigmoid"
for each one of this paths we’re given two activation functions, sigmoid_backward() and relu_backward().

Here is what i don’t understand. for the sigmoid_backward() function when I inspect the source code (using the inspect package) it shows the formula being:

def sigmoid_backward(dA, cache):
    """
    Implement the backward propagation for a single SIGMOID unit.

    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    
    Z = cache
    
    s = 1/(1+np.exp(-Z))
    dZ = dA * s * (1-s)
    
    assert (dZ.shape == Z.shape)
    
    return dZ

my understanding was that dZ when calculating over the sigmoid activation layer is supposed to be dZ = a - y but the calculation in the function is completely different.

What changed?

Yes, for the final layer that is true for the sigmoid function. Note that you pass dA to this function. With dA = (A - Y) / (A * (1 - A)), where A = s in this case, you will see that you reach the same formula.

1 Like

not sure I understand. I went back to check the video on understanding backward propagation on week 3 and it states:

dz = a - y

because

dz = da . g'(z)

The point is that the formulas you show take into account the special value of dA at the output layer. The point is that you could use sigmoid as an activation function in any of the hidden layers as well and that is what sigmoid_backward is written for. If you invoke it for the output layer with the dA as it happens to be in that case, you end up with the same result you show, but the point is this is the fully general version.

1 Like

@aari
As @jonaslalin mentioned that dA = (A - Y) / (A * (1 - A)), where A = s.