Hi, please explain the issue here. Thank You
Hello @Syed_Hamza_Tehseen! I hope you are doing well.
You implemented the linear_activation_backward
correctly but you are doing mistakes in a below code:
grads["dA" + str(L-1)] = ...
grads["dW" + str(L)] = ...
grads["db" + str(L)] = ...
Think about it. What are the dA_prev_temp, dW_temp, db_temp
?
Aren’t they also gradients obtained in the “linear_activation_backward” function??
Yes, they are.
By using grads["dA" + str(L-1)]
, you are telling that dA_prev_temp
is for the previous layer. So, you then don’t need to use (L-1) again with any other term. Same for dW
and db
.
Best,
Saif.
Hi, when I remove (L+1) etc, I get this error. I also tried to assign just dA_temp_prev,
like grads[‘dA’ + str(L-1)] = da_prev_temp
but this also resulted in some error for relu
First, understand what grads
contains. It is given that:
grads -- A dictionary with the gradients
grads["dA" + str(l)] = ...
grads["dW" + str(l)] = ...
grads["db" + str(l)] = ...
So, grads
don’t contain dA_prev_temp, dW_temp and db_temp
, specifically. You need to call them by layer number. What makes you think to use grads["A_prev_temp"]
?
The grads dictionary would contain the values of dA_temp, dW_temp, and db_temp, respectively, right? so I have to assign each evaluated in linear_activation_backward in the dictionary, right?
Like “grads[‘dA’ + str(L-1)] = dA_prev_temp” ?? is this the correct process?
Yes, you got it.
But remember, A_prev_temp
is a previous layer’s gradient while dW
and db
are for the current layer.
Best,
Saif.
So, for sigmoid
we use dAL
in linear_activation_backward
. But what to use for hidden layers (in case of ReLU
)? Hint is given:
# lth layer: (RELU -> LINEAR) gradients.
# Inputs: "grads["dA" + str(l + 1)], current_cache". Outputs: "grads["dA" + str(l)] , grads["dW" + str(l + 1)] , grads["db" + str(l + 1)]
Best,
Saif.
Yes, I’ve got it all now. Thank you so much for all your help
I am glad that you hit the nail on the head.