W 4 A1 | Ex- 5 | Wrong shape and output

parrotox · May 4, 2021, 3:11am

I’ve been struggling with this for hours and can’t figure out what I’m doing wrong. I’m not really sure what I’m supposed to be doing in the first place, so I don’t even know where to start. Here’s my output:

AL is coming out wrong, and everything else is going wrong, too. Any guidance would be appreciated.

paulinpaloalto · May 4, 2021, 4:21am

This is L_model_forward which manages the forward propagation logic through all the layers of the network. So it’s pretty important to have a clear picture of what the overall process is before you start writing the code. One way to get a fairly concrete idea in a particular case like this is to work through what is called the “dimensional analysis” of what happens through all the layers. The way to do that is to start by writing down the shapes of all the input objects that we are given by the test case. In this case it is the routine

L_model_forward_test_case_2hidden

You can find that by clicking “File → Open” and then opening the file testCases.py. You can do that for yourself, but I’ll save you the trouble:

def L_model_forward_test_case_2hidden():
    np.random.seed(6)
    X = np.random.randn(5,4)
    W1 = np.random.randn(4,5)
    b1 = np.random.randn(4,1)
    W2 = np.random.randn(3,4)
    b2 = np.random.randn(3,1)
    W3 = np.random.randn(1,3)
    b3 = np.random.randn(1,1)
  
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}
    
    return X, parameters

So from that, we have the following important shapes:

X is 5 x 4, so we have 5 input features and 4 samples.
W1 is 4 x 5 and b1 is 4 x 1, so layer 1 has 4 output neurons.
W2 is 3 x 4 and b2 is 3 x 1, so layer 2 has 3 output neurons.
W3 is 1 x 3 and b3 is 1 x 1, so layer 3 has 1 output neuron.

Now consider what happens when you do the “linear activation” at layer 1. Here is the formula:

Z1 = W1 \cdot X + b1

So that dot product is 4 x 5 dot 5 x 4, which gives Z1 as 4 x 4. Adding b1 won’t change the shape. A1 is the output of the layer 1 activation (relu) applied to Z1. The activation functions are always applied “elementwise” meaning that the shape doesn’t change. So A1 is 4 x 4.

Now we do layer 2:

Z2 = W2 \cdot A1 + b2

We get 3 x 4 dotted with 4 x 4 which gives 3 x 4 output. So both Z2 and A2 will be 3 x 4.

Then at layer 3 we have:

Z3 = W3 \cdot A2 + b3

which will be 1 x 3 dotted with 3 x 4 which gives 1 x 4 output for Z3 and A3.

Ok, that now gives us the complete picture of the dimensions that should occur at each step. Now compare that to what you get: your AL value (which should be A3) is 3 x 4. So how could that happen? Note that 3 x 4 is the shape of A2. So one possibility is that you skipped the processing for layer 3.

But at least now you have something concrete to compare with your results, which is why “dimensional analysis” is always a recommended way to start debugging in a situation like this.

parrotox · May 6, 2021, 3:47am

Thank you that helped me confirm what I was doing. It seems like maybe I’m understanding the matrix calculations alright (surprisingly!) but I can’t get the code to operate correctly.

The only way I can get the final result to come out is to change the first line of the for loop, which is outside the editing area. I changed:

for l in range(1, L):
to
for l in range(1, L+1):

After that, I told the loop to print some extra info so I could see what it was doing. Here is what I got:

W1.A0+b1  has shape:  (4, 4) 
 [[0.         3.18040136 0.4074501  0.        ]
 [0.         0.         3.18141623 0.        ]
 [4.18500916 0.         0.         2.72141638]
 [5.05850802 0.         0.         3.82321852]]
W2.A1+b2  has shape:  (3, 4) 
 [[ 2.2644603   1.09971298  0.          1.54036335]
 [ 6.33722569  0.          0.          4.48582383]
 [10.37508342  0.          1.63635185  8.17870169]]
W3.A2+b3  has shape:  (1, 4) 
 [[0.         0.87117055 0.         0.        ]]

 Using this with Sigmoid: 
 [[ 2.2644603   1.09971298  0.          1.54036335]
 [ 6.33722569  0.          0.          4.48582383]
 [10.37508342  0.          1.63635185  8.17870169]] 
Sigmoid on last layer results in shape:  (1, 4) 3 
 [[[0.03921668 0.70498921 0.19734387 0.04728177]]

AL = [[0.03921668 0.70498921 0.19734387 0.04728177]]

It shouldn’t be running the last W3 and b3 through the relu loop, but if I don’t let that happen, then I get a list index out of range error. I have no idea what I’m doing wrong. I don’t know if it’s something about how I’m appending the cache, or how I’m calling the final A value, or what.

paulinpaloalto · May 6, 2021, 4:46am

It is a mistake to change the loop limit on the “for” loop. What is the value of L here? It is 3 for this particular test case, right? So there are two hidden layers: layer 1 and layer 2 and then there is the output layer (layer 3). The two hidden layers use ReLU and the output layer uses sigmoid. If you are not sure how loop indexes work in python, remember that everything is “0 based”. Run the following loop and watch what happens:

for ii in range(1,5):
    print(f"ii = {ii}")

So within the “for” loop, it only processes the hidden layers with the “relu” activation. You execute that for l = 1 and l = 2 and then you fall out of the loop and run one more layer with sigmoid as the activation. Note that the AL values you show in the sigmoid case look correct. You need to figure out what is causing the list index error as the next step.

parrotox · May 6, 2021, 4:56am

I’ve been trying for days to figure out what I’m doing wrong. I realize that looping through 3 times is wrong, but doing so is the only way it comes out with the “right answer”, which obviously means there’s something wrong in my code. But I don’t know what.

My most recent investigations have revealed that the “piece” of the caches list that I need to use in my final layer appears to be part of my third loop, which is why it can’t retrieve it until it runs 3 times. I don’t understand what’s going on.

After I run my code, I print t_caches and get:

[((array([[-0.31178367,  0.72900392,  0.21782079, -0.8990918 ],
          [-2.48678065,  0.91325152,  1.12706373, -1.51409323],
          [ 1.63929108, -0.4298936 ,  2.63128056,  0.60182225],
          [-0.33588161,  1.23773784,  0.11112817,  0.12915125],
          [ 0.07612761, -0.15512816,  0.63422534,  0.810655  ]]),
   array([[ 0.35480861,  1.81259031, -1.3564758 , -0.46363197,  0.82465384],
          [-1.17643148,  1.56448966,  0.71270509, -0.1810066 ,  0.53419953],
          [-0.58661296, -1.48185327,  0.85724762,  0.94309899,  0.11444143],
          [-0.02195668, -2.12714455, -0.83440747, -0.46550831,  0.23371059]]),
   array([[ 1.38503523],
          [-0.51962709],
          [-0.78015214],
          [ 0.95560959]])),
  array([[-5.23825714,  3.18040136,  0.4074501 , -1.88612721],
         [-2.77358234, -0.56177316,  3.18141623, -0.99209432],
         [ 4.18500916, -1.78006909, -0.14502619,  2.72141638],
         [ 5.05850802, -1.25674082, -3.54566654,  3.82321852]])),
 ((array([[0.        , 3.18040136, 0.4074501 , 0.        ],
          [0.        , 0.        , 3.18141623, 0.        ],
          [4.18500916, 0.        , 0.        , 2.72141638],
          [5.05850802, 0.        , 0.        , 3.82321852]]),
   array([[-0.12673638, -1.36861282,  1.21848065, -0.85750144],
          [-0.56147088, -1.0335199 ,  0.35877096,  1.07368134],
          [-0.37550472,  0.39636757, -0.47144628,  2.33660781]]),
   array([[ 1.50278553],
          [-0.59545972],
          [ 0.52834106]])),
  array([[ 2.2644603 ,  1.09971298, -2.90298027,  1.54036335],
         [ 6.33722569, -2.38116246, -4.11228806,  4.48582383],
         [10.37508342, -0.66591468,  1.63635185,  8.17870169]])),
 ((array([[ 2.2644603 ,  1.09971298,  0.        ,  1.54036335],
          [ 6.33722569,  0.        ,  0.        ,  4.48582383],
          [10.37508342,  0.        ,  1.63635185,  8.17870169]]),
   array([[ 0.9398248 ,  0.42628539, -0.75815703]]),
   array([[-0.16236698]])),
  array([[-3.19864676,  0.87117055, -1.40297864, -3.00319435]])),
 ((array([[ 2.2644603 ,  1.09971298,  0.        ,  1.54036335],
          [ 6.33722569,  0.        ,  0.        ,  4.48582383],
          [10.37508342,  0.        ,  1.63635185,  8.17870169]]),
   array([[ 0.9398248 ,  0.42628539, -0.75815703]]),
   array([[-0.16236698]])),
  array([[-3.19864676,  0.87117055, -1.40297864, -3.00319435]]))]

Unless I’m interpreting it incorrectly, the array piece that looks like this:

((array([[ 2.2644603 ,  1.09971298,  0.        ,  1.54036335],
          [ 6.33722569,  0.        ,  0.        ,  4.48582383],
          [10.37508342,  0.        ,  1.63635185,  8.17870169]]),

…is the one I need to use in my final sigmoid function. But that part of the list doesn’t get appended until the third loop happens. So how do I get the output of the third loop without running the third loop?

paulinpaloalto · May 6, 2021, 5:10am

This is forward propagation, right? We don’t use the caches for anything here: we simply build them and then they get used in back propagation.

parrotox · May 6, 2021, 5:11am

but then how do I get A for the sigmoid part?

paulinpaloalto · May 6, 2021, 5:12am

It is the output of layer 2.

paulinpaloalto · May 6, 2021, 5:12am

You have it sitting in a variable already. Just note that it is not A_prev at that point.

parrotox · May 6, 2021, 5:14am

but isn’t it “stuck” in the loop? the relu loop doesn’t return anything. It just appends its results to caches. I feel like I’m missing something really fundamental here.

paulinpaloalto · May 6, 2021, 5:15am

Yes, I just gave you the answer: it is the value of the variable that got the return value on the last iteration of the loop.

parrotox · May 6, 2021, 5:17am

in caches, right? But that’s what I’m confused about. the value I need in order to get the correct values for my final answer don’t get appended until the 3rd loop through

paulinpaloalto · May 6, 2021, 5:18am

The scoping model in python is at the function level: there are no finer level “blocks”.

parrotox · May 6, 2021, 5:20am

I’m so sorry, but I don’t know what any of your last message means. scoping model? function level? blocks?

paulinpaloalto · May 6, 2021, 5:23am

You do not need to get anything from the caches here. That is just wrong. Drop that idea and look at the logic in the for loop. The return value of linear_activation_forward includes the A value, so that is assigned to some variable in the loop. That variable still holds the same value when you fall out of the loop, right?

parrotox · May 6, 2021, 5:23am

so before the loop starts, A is defined outside the loop. Then the relu loop starts and A defined as A_prev. Then I start my part of the code, which results in A and cache. cache gets stored in caches by appending it. A gets recycled through the loop but is not output anywhere. Then the loop is finished and I move on to sigmoid, where I need A from the loop, but it wasn’t stored anywhere. How do I get it? Am I storing the wrong thing?

paulinpaloalto · May 6, 2021, 5:24am

What do you mean wasn’t stored anywhere? It is in the variable A, right?

paulinpaloalto · May 6, 2021, 5:25am

That was my point about scope. A is a local variable in the whole function. It does not get destroyed when you exit the for loop, right?

parrotox · May 6, 2021, 5:26am

oh. ok. wait I see, it is printing it here. I told it to print A after each loop, and this is it. This is what I need.:

W1.A0+b1  has shape:  (4, 4) 
 [[0.         3.18040136 0.4074501  0.        ]
 [0.         0.         3.18141623 0.        ]
 [4.18500916 0.         0.         2.72141638]
 [5.05850802 0.         0.         3.82321852]]
W2.A1+b2  has shape:  (3, 4) 
 [[ 2.2644603   1.09971298  0.          1.54036335]
 [ 6.33722569  0.          0.          4.48582383]
 [10.37508342  0.          1.63635185  8.17870169]]

Ok, I’m going to play with that for a bit. It’s not working when I just plug it in, so I need to try some things. Thank you for your patience with me. I know I’m kind of a dummy with this stuff!

parrotox · May 6, 2021, 5:31am

Huzzah! I got it! Thank you so much for your help. turns out it was a lower case l versus an upper case L that got me. After that error, I went down a really huge long rabbit hole I should never have visited and I got very lost.

Topic		Replies	Views
Week 4 assignment part 1 Exercise 5 L_model_forward Neural Networks and Deep Learning	3	811	September 24, 2021
W4_A1_Ex-9_L_model_backward: wrong shape and output Neural Networks and Deep Learning	8	763	November 9, 2022
Week 4, Assignment 1, Exercise 5 : Wrong shape issue Neural Networks and Deep Learning	3	514	August 30, 2023
Week 4 - L_layer_model Neural Networks and Deep Learning	12	1098	June 4, 2021
W4 A1 Exercise 9 L_model_forward shapes not aligned Neural Networks and Deep Learning week-4	9	27	September 19, 2024

W 4 A1 | Ex- 5 | Wrong shape and output

Related topics