# W4_A1_Ex-9_L_Model_Backward_Function

Hello everyone, I’m currently working on the Programming Assignment for Week 4, which involves building a deep neural network step by step. However, I’m having some trouble with Exercise 9 - L_model_backward. Despite my efforts, I couldn’t find the right code for this exercise. Could someone please help me with this? Thank you in advance for your assistance.

Hello @Mohammad_ferdosian! I hope you are doing well.

Let’s deconstruct the guide of Exercise 9.

It is give that:

To compute `dAL`, use this formula (derived using calculus which, again, you don’t need in-depth knowledge of!):

``````dAL = - (np.divide(Y, AL) - np.divide(1 - Y, 1 - AL)) # derivative of cost with respect to AL
``````

So, do you have any difficulty implementing that?

Next, you need to compute these:

``````    #Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "dAL, current_cache". Outputs: grads["dAL-1"], grads["dWL"], grads["dbL"]
#(approx. 5 lines)
# current_cache = ...
# dA_prev_temp, dW_temp, db_temp = ...
# grads["dA" + str(L-1)] = ...
# grads["dW" + str(L)] = ...
# grads["db" + str(L)] = ...
``````

It is clearly instructed that what is `current_cache` for sigmoid. See below figure:

Now you need to call back function as instructed in the below figure:

If you don’t know which back function to implement, here is a hint:
You know that output is `dA_prev_temp, dW_temp, db_temp`. So which function returns these? Check the previous exercise, you need to call that function with the correct input. And you know what is the activation for the output layer, right?

After that, use equation 15 to define these:

``````    # grads["dA" + str(L-1)] = ...
# grads["dW" + str(L)] = ...
# grads["db" + str(L)] = ...
``````

So, thats all for the output layer.

Can you deconstruct the steps for hidden layers? You have to use for loop. The hint is in the below figure:

Let me know if you need any further assistance.

Best,
Saif.

2 Likes

Theses tips were helpful but I’m still confused about a few things. My main point of confusion are the parameters for both `linear_activation_forward()` functions for the implementation of both sigmoid and relu.

The parameters I’m confused about for theses functions are mainly `W` and `b`. First off, I’m confident I correctly set the parameter `A_prev=dAL`. For the implementation of the first `current_cache` variable in my code, the parameter is `activation="sigmoid"`, the second `"relu"`. Furthermore, I don’t know how I could assign the variables `W` and `b`. I think they are stored in the variable `cache` but even so I don’t know how to get them.

After this I call linear_activation_backward() with parameters `dAL`, `current_cache`, and activation= `"sigmoid"` or `"relu"`.

Can you point me in the right direction?

First, tell me which exercise you are doing? Can you share the error you are getting?

Why you are doing this?

In `initialize_parameters_deep` we have given this:

``````parameters['W' + str(l)] =
parameters['b' + str(l)] =
``````

Use this intuition to understand how we are getting the W and b from a dictionary named “parameters”.

But you wrote the code that created the caches in the forward propagation section, right? So you know where to look to understand what is in them. At each layer, the cache entry looks like this:

`((A, W, b), Z)`

So it is a “2-tuple” and the first element of it is a 3-tuple (the “linear cache”). Note that the A value there is the input, not the output. So at a given layer the 3-tuple is (A^{[l-1]}, W^{[l]},b^{[l]}).

Also note that they give you the logic in the template code to extract the values from the cache entries.

Oh okay, so I can just index that tuple. That makes so much sense.

Thanks for your help. Now I’m just getting the wrong values for all my variables, I’ll play around with it and let you know if I can’t find the issue.

For exercise 9, week 4, I’m unable to figure out how I got the wrong values (Shown below). Do you haven any suggestions? I must be passing the wrong parameters in somewhere.

For the sigmoid layer, I’m passing in the parameters A from caches[L-1][0][ x ] where x=0, W from caches x=1, b from caches x=2, and “sigmoid” to linear activation forward. Then also for the sigmoid layer I’m passing dAL and current_cache[1] into linear activation backward.

For the relu layers, I am passing similar parameters. For linear forward activation I’m using caches[l][0][ x ] and for linear_activation backward I’m using dA_prev_temp.

You’re worrying me a bit with your description of how you handle the caches. At the level of `L_model_backward`, all you need to do is extract one of the “layer” cache entries and pass it down to `linear_activation_backward` each time you call it, right? You should not need to “subindex” it at that level: that happens inside `linear_activation_backward` and `linear_backward` and they gave you that logic in the “template” code, right?

But if you were using the wrong cache entry values, you’d get shape mismatch errors being thrown, not simply incorrect output values.

The key is you start with the output layer and then walk backwards through the hidden layers. Are you sure you passed “relu” as the activation for the hidden layers?

1 Like

Yes, I pass ‘relu’ as the activation for the hidden layers and ‘sigmoid’ as the activation for the first calculation in back prop. To calculate current_cache to pass into linear activation backward, I call linear activation forward. linear activation forward has the parameters (A_prev, W, b, activation). Therefore, I indexed the last value in caches to get theses values. Is that incorrect?

This is wrong. To calculate `current_cache`, you don’t need to call any function. What is the highlighted part says in the below image? It explicitly gives you the answer for `sigmoid`. Check for the `relu` too.

Below, I added some more comments for you. Use the same intuition for `relu` too.

``````# Lth layer (SIGMOID -> LINEAR) gradients. Inputs: "dAL, current_cache". Outputs: "grads["dAL-1"], grads["dWL"], grads["dbL"]
#(approx. 5 lines)
current_cache = current_cache with sigmoid (check the above image, highlighted part)
dA_prev_temp, dW_temp, db_temp = call linear activation backward function with correct arguments.
grads["dA" + str(L-1)] = dA of the previous layer (temp)
grads["dW" + str(L)] = dW of the current layer (temp)
grads["db" + str(L)] = db of the current layer (temp)

``````

Best,
Saif.

1 Like

Oh thanks,
I read the highlighted part wrong. I thought it was asking us to call that function rather than just take the associated cache. Thanks for the clarification

Thanks,
Jeb

## thanks for your explanation. I just faced a problem i used code below for relu function but I am getting error linear_activation_backward(dAL,current_cache,‘relu’) error:

IndexError Traceback (most recent call last)
in
1 t_AL, t_Y_assess, t_caches = L_model_backward_test_case()
----> 2 grads = L_model_backward(t_AL, t_Y_assess, t_caches)
3
4 print("dA0 = " + str(grads[‘dA0’]))
5 print("dA1 = " + str(grads[‘dA1’]))

in L_model_backward(AL, Y, caches)
49 #(approx. 5 lines)
50 current_cache = caches[l]
—> 51 dA_prev_temp, dW_temp, db_temp = linear_activation_backward(dAL,current_cache,‘relu’)
52 grads[“dA” + str(l)] = dA_prev_temp
53 grads[“dW” + str(l + 1)] = dW_temp

in linear_activation_backward(dA, cache, activation)
19 if activation == “relu”:
20 #(≈ 2 lines of code)
—> 21 dZ = relu_backward(dA, activation_cache)
22 dA_prev, dW, db = linear_backward(dZ,linear_cache)
23 # YOUR CODE STARTS HERE

~/work/release/W4A1/dnn_utils.py in relu_backward(dA, cache)
54
55 # When z <= 0, you should set dz to 0 as well.
—> 56 dZ[Z <= 0] = 0
57
58 assert (dZ.shape == Z.shape)

IndexError: boolean index did not match indexed array along dimension 0; dimension is 1 but corresponding boolean dimension is 3

The point of the error is that the values you passed down to `relu_backward` don’t match, right? The dA is a different shape than the Z value in the cache. So how could that happen? Notice that you are always passing dAL for the dA argument when you call `linear_activation_backward` from `L_model_backward`. That only works at the output layer, right?

This is a classic example of how debugging works: the error is thrown two levels down the call stack in a routine that was just given to you, so you can assume it’s correct. So how could that happen? You must have passed bad arguments, so you start by seeing what the error means and then you track backwards up the call stack to figure out where the real problem is. Where did the bad value come from?