I’m confused about Exercise 9 - L_model_backward. I think I might understand how to get current_cache (though I’m not sure because none of my code is working at all), but the next line I don’t understand:
dA_prev_temp, dW_temp, db_temp = ...
I’m pretty sure I’m supposed to use the linear_backward function here, but I need dZ as input for that, which I don’t have. Am I supposed to nest a sigmoid_backward function inside this? That’s the only place I remember calculating dZ before, since in Exercise 7 where we used linear_backward, we were given dZ. Or am I way off base here? In which case, can someone please point me in the right direction?
Hi @parrotox for the backprop step of the network you need to differentiate back the through the non linear step and the linear step of each layer. In the noteboook there is a function, linear_activation_backward
(that uses linear_backward
) and Implement the backward propagation for the LINEAR->ACTIVATION layer.
Note that the linear_backward
function on excercise 7 Implement the linear portion of backward propagation for a single layer (layer l)
Hope this can help you
YES! Thank you, I got it now
Wonderful @parrotox, happy I could help you out!!
Hey, i’m having trouble with current_cache, i don´t know how to called, someone can help me please, i got this error.
TypeError Traceback (most recent call last)
in
1 t_AL, t_Y_assess, t_caches = L_model_backward_test_case()
----> 2 grads = L_model_backward(t_AL, t_Y_assess, t_caches)
3
4 print("dA0 = " + str(grads[‘dA0’]))
5 print("dA1 = " + str(grads[‘dA1’]))
in L_model_backward(AL, Y, caches)
41 current_cache = caches
42
—> 43 dA_prev_temp, dW_temp, db_temp =linear_activation_backward(dAL, current_cache, activation = “sigmoid”)
44 grads[“dA” + str(L-1)] = dA_prev_temp
45 grads[“dW” + str(L)] = dW_temp
in linear_activation_backward(dA, cache, activation)
33 # dA_prev, dW, db = …
34 # YOUR CODE STARTS HERE
—> 35 dZ = sigmoid_backward(dA, activation_cache)
36 dA_prev, dW, db = linear_backward(dZ, linear_cache)
37
~/work/release/W4A1/dnn_utils.py in sigmoid_backward(dA, cache)
74 Z = cache
75
—> 76 s = 1/(1+np.exp(-Z))
77 dZ = dA * s * (1-s)
78
TypeError: bad operand type for unary -: ‘tuple’
Hi @paolaruedad in the linear_activation_backward
where you are getting the error you are using the sigmoid function. Think about in which layers you are using the sigmoid so you can decide what’s the right content for current_cache
at that step.
If you have a look at the for
loop below where ReLU is used it may help you out too.
I had the same problem as @paolaruedad that I solved thanks to your advice @albertovilla
Though, now I have a problem with what I believe is the cache of the loop. Here’s my error message.
IndexError Traceback (most recent call last)
in
1 t_AL, t_Y_assess, t_caches = L_model_backward_test_case()
----> 2 grads = L_model_backward(t_AL, t_Y_assess, t_caches)
3
4 print("dA0 = " + str(grads[‘dA0’]))
5 print("dA1 = " + str(grads[‘dA1’]))
in L_model_backward(AL, Y, caches)
66 # YOUR CODE STARTS HERE
67 current_cache = caches[l]
—> 68 dA_prev_temp, dW_temp, db_temp = linear_activation_backward(grads[“dA” + str(l + 1)], current_cache, “relu”)
69 grads[“dA” + str(l)] = dA_prev_temp + str(l)
70 grads[“dW” + str(l + 1)] = dW_temp + str(l + 1)
in linear_activation_backward(dA, cache, activation)
21 # dA_prev, dW, db = …
22 # YOUR CODE STARTS HERE
—> 23 dZ = relu_backward(dA, activation_cache)
24 dA_prev, dW, db = linear_backward(dZ, linear_cache)
25
~/work/release/W4A1/dnn_utils.py in relu_backward(dA, cache)
54
55 # When z <= 0, you should set dz to 0 as well.
—> 56 dZ[Z <= 0] = 0
57
58 assert (dZ.shape == Z.shape)
IndexError: too many indices for array
I have tried every combination reasonable to try of cache and cache[y] but can’t find an answer. Now, I am out of idea to solve this problem. Can you help me?
Hi, as the error is happening when calling the relu_backward
function I would suggest that you temporarily edit that function (which is defined in the file dnn_utils.py
) to print dZ
, dZ.shape
and Z.shape
in that context.
And I say temporarily because you are not expected to modify this file, it is correct as it is, so you only may want to do it in order to debug this issue.
i’m sorry, but i still not getting it, i just changed a lot of parameters, it is backward propagation, i don’t know what to put in the cache o how to called it, please some idea. it has been a day already, i’m just stuck.
The above is assigning to current_cache
the full list of caches
but how many layers do you have with sigmoid activation? Just one, so you have to assign to current_cache
the right index from caches
. Does it help?
Ok, yes i understand, but i think the problem it is how to called that in python, i’m doing this nameofdelist[+ str(L)], for the sigmoide because it is the 3 layer, and for the relu are two, nameofdelist[+ str(l)], i assumed it is l, because the for loop is doing the iteration, but i still getting the error, how do i supposed to take just the respective layers in python?
I think an example could help.
Let’s assume there are 3 layers, the caches
list would have the indexes: 0, 1, 2. So for the last layer, you would should assign caches[2]
. Obviously you don’t need to hardcode the numbers you have to use L
. Note that the indexes of the list are integers.
@albertovilla I have no idea how to do this
Why should I change the source code?
@lachainone your error is taking place in this line:
This is using a function defined in the file W4A1/dnn_utils.py
, in particular the error you are getting is in the statement dZ[Z <= 0] = 0
but you don’t know how dZ
relates to your inputs to the function because your parameters are dA
and activation_cache
.
In order to understand why the error is happening I would suggest you open the dnn_utils.py
file and add some print statements so you can backtrace where is the error and then correct the problem in your code.
You can open that file by clicking on the Jupyter logo, you will see a folder release
and from there you can navigate to the file and open it.
Alternatively, you could skip editing this file and try to replicate the problem in your Jupyter notebook by noticing how is dZ
calculated in that function:
dZ = np.array(dA, copy=True)
The full function is:
def relu_backward(dA, cache):
“”"
Implement the backward propagation for a single RELU unit.
Arguments:
dA -- post-activation gradient, of any shape
cache -- 'Z' where we store for computing backward propagation efficiently
Returns:
dZ -- Gradient of the cost with respect to Z
"""
Z = cache
dZ = np.array(dA, copy=True) # just converting dz to a correct object.
# When z <= 0, you should set dz to 0 as well.
dZ[Z <= 0] = 0
assert (dZ.shape == Z.shape)
return dZ
@albertovilla
The issue had nothing to do with the cache.
It’s solved now
@lachainone, how did You solve this problem. I’ve been stuck with it for 2 days now and have no idea how to solve it
Did anynone have error like above?
I’m having the same error on my side. I think its something related to the value of dA inside the loop. When I I put it as dAL its gives me the same error but I want to put it as a variable to change each time I loop inside a layer but I dont know how to put it as a variable containing l. did you solve it?
Hello team,
I’m facing the similar sort of problem too while running the codes for exercise 9 (Week 4/Assignment 1). The traceback is hitting the most recent call and error on (dAL). I am not able to figure out what is wrong with the codes in this case? Kindly help. Thanks and regards.
Hi @Rashmi , have a look at your input parameters of the linear_activation_backward function. It requires the gradient which you initialized, not grads[“dAL”] which is the full dictionary of gradients.