Course 1 week 4, assignment 1, exercise 8: linear activation backward

aari · August 29, 2021, 6:35pm

hello deeplearning team, I’m having trouble getting this function to work correctly.

the function has this comment:

"""
    Implement the backward propagation for the LINEAR->ACTIVATION layer.
    
    Arguments:
    dA -- post-activation gradient for current layer l 
    cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
    activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
    
    Returns:
    dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
    dW -- Gradient of the cost with respect to W (current layer l), same shape as W
    db -- Gradient of the cost with respect to b (current layer l), same shape as b
    """

and inside it two possible paths, or activation = "relu" or activation = "sigmoid"
for each one of this paths we’re given two activation functions, sigmoid_backward() and relu_backward().

Here is what i don’t understand. for the sigmoid_backward() function when I inspect the source code (using the inspect package) it shows the formula being:

def sigmoid_backward(dA, cache):
    """
    Implement the backward propagation for a single SIGMOID unit.

    Arguments:
    dA -- post-activation gradient, of any shape
    cache -- 'Z' where we store for computing backward propagation efficiently

    Returns:
    dZ -- Gradient of the cost with respect to Z
    """
    
    Z = cache
    
    s = 1/(1+np.exp(-Z))
    dZ = dA * s * (1-s)
    
    assert (dZ.shape == Z.shape)
    
    return dZ

my understanding was that dZ when calculating over the sigmoid activation layer is supposed to be dZ = a - y but the calculation in the function is completely different.

What changed?

jonaslalin · August 29, 2021, 6:48pm

Yes, for the final layer that is true for the sigmoid function. Note that you pass dA to this function. With dA = (A - Y) / (A * (1 - A)), where A = s in this case, you will see that you reach the same formula.

aari · August 29, 2021, 7:08pm

not sure I understand. I went back to check the video on understanding backward propagation on week 3 and it states:

dz = a - y

because

dz = da . g'(z)

paulinpaloalto · August 29, 2021, 7:35pm

The point is that the formulas you show take into account the special value of dA at the output layer. The point is that you could use sigmoid as an activation function in any of the hidden layers as well and that is what sigmoid_backward is written for. If you invoke it for the output layer with the dA as it happens to be in that case, you end up with the same result you show, but the point is this is the fully general version.

Usman_Abbas · February 11, 2022, 8:56am

@aari
As @jonaslalin mentioned that dA = (A - Y) / (A * (1 - A)), where A = s.

Topic		Replies	Views
W4_A1_Computing Activation functions in Linear Activation Backward Neural Networks and Deep Learning	7	491	August 14, 2023
Sigmoid Function in Layer L Neural Networks and Deep Learning	8	721	January 30, 2023
dZ for sigmoid in linear_activation_backward Neural Networks and Deep Learning	6	737	October 27, 2022
W4 - Shouldn't the activation A be also cached? Neural Networks and Deep Learning week-1	6	30	November 7, 2024
Sigmoid and Relu backward Neural Networks and Deep Learning	6	522	May 22, 2023

Course 1 week 4, assignment 1, exercise 8: linear activation backward

Related topics