The test code obviously is written to expect that your function returns a grads dictionary that contains that key. So you need to figure out why your grads variable does not contain that key. The test cells are not modifiable, but you can do “Insert → Cell Below” and print the keys from grads like this:
So how can that happen? Note that any given layer, you are computing dW and db for the current layer, but dA for the previous layer, right? So that should be a clue that there are probably two things wrong with your logic:
Your logic must be wrong such that you are only doing one layer, not all layers.
Your logic at the given layer is also wrong because you are not labelling the dA output correctly.
When you are doing back propagation, the input value of dA is from the next later layer, not the previous layer, right? It looks like you must have started by just “copy pasting” the forward propagation logic and then patching it up. The whole point is that you’re going backwards. You do the output layer outside the loop and then the loop walks backwards through the hidden layers.
There is logic before the line that throws which “manually” computes dAL. Did you add that value to the grads dictionary with the appropriate key? You can print the keys of a python dictionary by saying:
print(myDictionary.keys())
Note that the key value is not literally “dAL”: it is “dA2” or whatever the appropriate layer number is in this test case, right?
Notice that the test cases here are 2 layer nets. The way L_model_backward works is that you do the output layer first and that happens outside the loop over the hidden layers, right? So that is separate logic from the logic in the main loop and it’s the outer layer logic that should be producing dA1, right? So your logic in the hidden layer loop is correct (it produces dA0 instead of dA1), but your logic outside the loop is not correct.
For reference, here’s what I get when I print grads.keys():
Of course another interesting thing to check is whether the dA2 value you are inserting in the dictionary is just mislabeled and is really dA1 or whether it’s actually the value of dA2. Try printing the shape of it to see. You do actually compute dA2 as the very first step of back propagation, but you’re not supposed to put that value in the dictionary.
What are the shapes of the gradients you get? Maybe you just misnamed them. Notice what they tell you in the instructions and the comments in that section. At each layer l you get the dW and db values for layer l, but the dA value is for the previous layer l - 1, right?
Yes, for the output layer, you also have the output dA being for the previous layer. But the other thing to worry about is what happens in the loop over the hidden layers. There you have the same logic: you take the dA for the given layer as input and you output dW and db for the given layer, but dA for the previous layer.
Maybe your loop didn’t execute enough times. Did you get dW1 and db1 in your grads dictionary?