W4_A1_Ex-9_L_Model_Backward_KeyError: 'dA0'

I am really a little confused on how to begin the backprop chain. I know I have two levels. I know the *caches" structure FOR THE TEST EXAMPLES as a list of 2 nested tuples. I know I have get info in LIFO order for the Linear-Activation part. But I am stuck on the most basic. How to get the Sigmoid Linear. derivative.
I don’t think that this the proper start. It seems like the start for the Linear Activation LOOP. But what do I do to for the sigmoid?

###Lth layer (SIGMOID → LINEAR) gradients. Inputs: “dAL, current_cache”. Outputs: "grads[“dAL-1”], grads[“dWL”], grads[“dbL”] ##

current_cache=caches[L-1] ← this would be the LAST IN
prev_temp, dW_temp, db_temp = linear_activation_backward(dAL, current_cache,“sigmoid”)
grads[“dA” + str(L-1)] = dA_prev_temp
grads[“dW” + str(L)] = dW_temp
grads[“db” + str(L)] = db_temp

This gives me a key error
KeyError Traceback (most recent call last)
in
2 grads = L_model_backward(t_AL, t_Y_assess, t_caches)
3
----> 4 print("dA0 = " + str(grads[‘dA0’]))
5 print("dA1 = " + str(grads[‘dA1’]))
6 print("dW1 = " + str(grads[‘dW1’]))

KeyError: ‘dA0’

Any help ? . Thanks

2 Likes

hi @Gian , a key error in Python means basically you are calling something that does not exist in the dictionary, so it looks like dA0 does not exist. Your code seems to be okay, so most likely there is something that does not work in the intialization of the backprop. or maybe in the loop that you are calling. Did you check that?

2 Likes

Hi Sjfischer thanks.
I solved that dA0 index problem it was due to debugging. My current issue is the subscript out of bounds error in the relu_backward(dA, cache) during Linear-RELU backprop, Tell me what you think.

Here is what I know.
My loading the sigmoid linear level is described in the previous email above. So I assume that’s ok.
In the Linear _Relu portion of the code, the loop decrements from 0 to 1, I checked l . There are two activation layers in the caches. For linear relu we have to un pack in that order . It’s a LIFO property so 1 corresponds to the last level in the forward prop.
I thought that the index error would be referencing this
grads[“db” + str(l + 1)
BUT THAT isn’t true. It seems that there may be a broadcast error in

relu_backward(dA, cache)

boolean index did not match indexed array along dimension 0; dimension is 1 but corresponding boolean dimension is 3

I am kinda lost as to how to solve this and given that I almost finished the course I’d like to get this done.
Any help appreciated.

Here is the code debug message with the code around it
IndexError Traceback (most recent call last)
in
1 t_AL, t_Y_assess, t_caches = L_model_backward_test_case()
----> 2 grads = L_model_backward(t_AL, t_Y_assess, t_caches)
3
4 print("dA0 = " + str(grads[‘dA0’]))
5 print("dA1 = " + str(grads[‘dA1’]))

in L_model_backward(AL, Y, caches)
71 # YOUR CODE STARTS HERE
72 current_cache =caches[l]
—> 73 dA_prev_temp, dW_temp, db_temp =linear_activation_backward(dAL, current_cache,“relu”)
74 grads[“dA” + str(l)] = dA_prev_temp
75 grads[“dW” + str(l + 1)] = dW_temp

in linear_activation_backward(dA, cache, activation)
22 # dA_prev, dW, db = …linear_cache
23 # YOUR CODE STARTS HERE
—> 24 dZ = relu_backward(dA, activation_cache)
25 #dA_prev,dW,db= linear_cache
26 dA_prev,dW,db=linear_backward(dZ, linear_cache)

~/work/release/W4A1/dnn_utils.py in relu_backward(dA, cache)
54
55 # When z <= 0, you should set dz to 0 as well.
—> 56 dZ[Z <= 0] = 0
57
58 assert (dZ.shape == Z.shape)

IndexError: boolean index did not match indexed array along dimension 0; dimension is 1 but corresponding boolean dimension is 3

2 Likes

I am having same problem…can’t find a solution

Riddhiman What is your bug? IS it in the Sigmoid part or the RELU / loop part?

i am having bug at the relu loop part

I found my bug for the Linear/Relu backward differentiator portion i.e. the loop. Everything was correct except for parameters for
linear_activation_backward(dA, current_cache,“relu”) If you notice I copy pasted the wrong parameter by passing the name dA which is incorrect. LAB should calculate the next gradient of A so dA should be the dA_temp of prev statement.

If you think about it it all makes sense. Also be very cautious about copy paste , it set me back a lot of time.

15 Likes

ohhhh i got it…thanks

1 Like

NP. Yeah what a time suck this was! Ok see you on the other side!

1 Like

I have the same issue, still haven’t been able to get it done

You might be entering wrong parameters to the linear_activation_backward function. Recheck that this time round the dA will be the dA_prev or the dA of the first term.

3 Likes

I am confused about the index used in caches list. When populating caches during forward propagation, index from 1 to L is used. In backpropagation, why do we then start with caches[L-1]?

That is Python’s specification.

Think about there are 3 variables in a list like a = [10,20,30]
The length of this list is obvious, “3”. But, Python index starts with 0. So, if you want to access the last variable, you need to specify a[2], not a[3] which will causes an error of “list index out of range”.

Thanks, yeah I got confused with forward propagation steps using range(1,L) but that is not related to indices used in caches list. (1,L) counter is not used as index to populate caches list object. It is directly used for populating parameter values for each layer in parameter dictionary. But the loop in which caches object is populated runs from range(1,L) which means 1 to L-1. This loop populates caches object from 0 to L-2 since list index is one behind the loop counter. And, last layer’s cache is populated separately outside of the loop because activation function is different for last layer. So, finally caches has elements caches[0] (for layer 1), caches[1] (for layer2)…caches[L-1] (for last layer)

As you mentioned, caches list index is governed by Python’s specification. So, in backpropagation, we start from caches[L-1]. Associating the list index with layer is what confused me.

1 Like

Hi, I am facing the same problem of KeyError: ‘dA0’

I am confuse here, why current_cache=caches[L-1] not current_cache=caches[L]? since the sigmoid function is the last layer in the network. Please explain for me

Hey , thanks so much for that tidbit, it makes so much sense that we have to start at the the dA_temp of the prev statement since it is backward propagation. Thanks again, saved me a lot brain racking to figure out what I was messing up!

Hi, Muhammad Usman.

Gaurav Malhotra has presented the case very well, just in the thread above. Go through that once and if you still not be able to understand the logic, we can always have it discuss here.

help me pls!!!

KeyError Traceback (most recent call last)
in
2 grads = L_model_backward(t_AL, t_Y_assess, t_caches)
3
----> 4 print("dA0 = " + str(grads[‘dA0’]))
5 print("dA1 = " + str(grads[‘dA1’]))
6 print("dW1 = " + str(grads[‘dW1’]))

KeyError: ‘dA0’

Welcome to the community.

First of all, posting your code is not recommended. Please remove them.

What is the output of this L_model_backward() ?

When we get da^{[l]}, which is dAL in this assignment, the expected output is “weights” and “bias” of the layer “L”, and dAL for layer “L-1”. So, the subscript for dAL and that for dW/db is different. As the result, you could not update dA0.

1 Like