hello deeplearning team, I’m having trouble getting this function to work correctly.
the function has this comment:
"""
Implement the backward propagation for the LINEAR->ACTIVATION layer.
Arguments:
dA -- post-activation gradient for current layer l
cache -- tuple of values (linear_cache, activation_cache) we store for computing backward propagation efficiently
activation -- the activation to be used in this layer, stored as a text string: "sigmoid" or "relu"
Returns:
dA_prev -- Gradient of the cost with respect to the activation (of the previous layer l-1), same shape as A_prev
dW -- Gradient of the cost with respect to W (current layer l), same shape as W
db -- Gradient of the cost with respect to b (current layer l), same shape as b
"""
and inside it two possible paths, or activation = "relu"
or activation = "sigmoid"
for each one of this paths we’re given two activation functions, sigmoid_backward()
and relu_backward()
.
Here is what i don’t understand. for the sigmoid_backward()
function when I inspect the source code (using the inspect package) it shows the formula being:
def sigmoid_backward(dA, cache):
"""
Implement the backward propagation for a single SIGMOID unit.
Arguments:
dA -- post-activation gradient, of any shape
cache -- 'Z' where we store for computing backward propagation efficiently
Returns:
dZ -- Gradient of the cost with respect to Z
"""
Z = cache
s = 1/(1+np.exp(-Z))
dZ = dA * s * (1-s)
assert (dZ.shape == Z.shape)
return dZ
my understanding was that dZ
when calculating over the sigmoid activation layer is supposed to be dZ = a - y
but the calculation in the function is completely different.
What changed?