Course 1, week 4 Exercise 9: problem in relu_backward_

The code block for testing L_model_backward fails in the first line, the call to: L_model_backward_test_case()
Simply calling that function, independent of my code, fails with the following stack trace:

IndexError Traceback (most recent call last)
in
1 t_AL, t_Y_assess, t_caches = L_model_backward_test_case()
----> 2 grads = L_model_backward(t_AL, t_Y_assess, t_caches)
3
4 print("dA0 = " + str(grads[‘dA0’]))
5 print("dA1 = " + str(grads[‘dA1’]))

in L_model_backward(AL, Y, caches)
59 # YOUR CODE STARTS HERE
60 current_cache = caches[l]
—> 61 dA_prev_temp, dW_temp, db_temp = linear_activation_backward(dAL, current_cache, “relu”)
62 grads[“dA” + str(l)] = dA_prev_temp
63 grads[“dW” + str(l+1)] = dW_temp

in linear_activation_backward(dA, cache, activation)
22 # dA_prev, dW, db = …
23 # YOUR CODE STARTS HERE
—> 24 dZ = relu_backward(dA, activation_cache)
25 dA_prev, dW, db = linear_backward(dZ, linear_cache)
26

~/work/release/W4A1/dnn_utils.py in relu_backward(dA, cache)
54
55 # When z <= 0, you should set dz to 0 as well.
—> 56 dZ[Z <= 0] = 0
57
58 assert (dZ.shape == Z.shape)

IndexError: boolean index did not match indexed array along dimension 0; dimension is 1 but corresponding boolean dimension is 3

Thanks in advance for any help.

Just because the error gets thrown in a routine you did not write, that does not mean it’s not your fault. What it means is that your code passed incorrect arguments down to that provided routine. A perfectly correct subroutine will throw errors if you pass it mismatching arguments. Now you need to figure out how that happened. You can examine the logic of relu_backward to understand what it does and that may help you track backwards up the call stack to figure out where you went awry. Click “File → Open” and then open the appropriate “utility” file. You can deduce the name of the file by examining the “import” cell early in the notebook.

Ok. I see what you mean. “relu_backward” is correct.
My problem now is that the test block includes printing dA0, but there is no dA0 (that’d be X), right?
It seems that there are only two layers, L=2. The loop starts with l=0, and that’s why the l+1 is used for positioning gradients, right? But, there should only be one pass, and I seem to get two. So, is this a 2 layer network, with 1 hidden layer and 1 output?
Thanks

Yes, A0 is the equivalent of X. You can compute a gradient for A0, but it won’t do you any good since you can’t change X. But that’s just the way the loop works.

You’re worrying me a little here with your description of the loop. It doesn’t start with l = 0, right? The whole point of “backward propagation” is that it goes … wait for it … backwards. So you start with ending value of l. But you also have to keep in mind that everything in python is “0 based”. Try running the following two loops and watch what happens:

for ii in range(1,4):
    print(f"ii = {ii}")

for ii in reversed(range(5)):
    print(f"ii = {ii}")

You’re right that the particular test case that they give you for L_model_backward is a two layer network. But it’s good to always keep in mind that we are writing general code here: it should work for any number of layers.

Thanks for getting back so quickly!
This is the for-loop provided in the template for L_model_backward, going through all the relu units.

Loop from l=L-2 to l=0

for l in reversed(range(L-1)):

With L=2, and python 0 based, it follows I suppose that l=0 right from the start. I check the value of l inside the loop, and it does start at 0. That makes sense, since L-1 is 1 and we have the 0 baseness to deal with.
The commentary prior to the loop, before the code block dealing with the sigmoid part, is:

grads[“dA” + str(L-1)] = …

grads[“dW” + str(L)] = …

grads[“db” + str(L)] = …

The first additions to the grads dictionary would be dA1, dW2, and db2. That makes sense, since the output from a layer to the previous layer is dA[l-1].
The next layer, handled inside the loop, and starting with l=0, we’d expect to get dA0, dW1, and db1. And, that would be the end. So, there should only be 1 pass through the loop, with those gradients. I print the value of l inside the loop, first thing, but, for some reason, I see two passes through the loop, with l=0 printed twice. That may be because there’s an error of some sort, and python somehow re-executes the loop, with l=0.
In any event, it’s still not working :frowning:

Well, there must be something wrong with your logic in the loop. Are you sure you didn’t modify the “for” statement? Also note that indentation is a critical part of the syntax in python. The way you recognize the end of the loop is by the indentation, right?

The basic sequence is that you handle the output layer first before the loop starts. In this case, that would be layer 2. So you get dA1, dW2 and db2 as you say. Then you enter the loop and execute it once generating dA0, dW1 and db1.

I can’t see your code, so it’s up to you to figure out why the body of the loop is getting executed twice. If you think that you may have modified some of the template code that should not have been changed, there is a procedure documented on the FAQ Thread for getting a clean copy for comparison purposes.

Thanks!
I went over the for-loop, looking for anything amiss. I had a statement just below the for-loop, at proper indentation (aligned with the “for” statement), on the next line. I inserted a blank line above this statement, and voila! A successful run, passing the test. Gotta admit, python’s a bit unforgiving, but certainly powerful.

It is great to hear that you found the solution. You had me worried there for a while. Onward! :nerd_face: