W4 A1 Exercise 9 L_model_forward shapes not aligned

I am performing the L_model_forward and have two different methods depending on whether the activation is in the hidden layers or at the output layer. In the hidden layer I am using ReLu and in the output layer I am using a sigmoid activation function. I am calling the different parameters ‘W’ and ‘b’ from the parameters dictionary using the l index from the for loop but in the sigmoid activation I am using L because it is the last layer. Additionally, I use A_prev for my activations in the hidden layers but A for the output layer.

I am receiving an error that my shapes do not match the expected output. You can see I am very close to the expected shape but do not know where it is coming from. I was thinking I am appending incorrectly but from other documentation it looks ok.

It appears the error is stemming from the sigmoid activation part of the loop. Might be a combination of wrong A,W, b, or L values? Please advise:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-96-10fc901e800a> in <module>
      1 t_X, t_parameters = L_model_forward_test_case_2hidden()
----> 2 t_AL, t_caches = L_model_forward(t_X, t_parameters)
      3 
      4 print("AL = " + str(t_AL))
      5 

<ipython-input-95-b5091027b90c> in L_model_forward(X, parameters)
     43                                               parameters['W' + str(L)],
     44                                               parameters['b' + str(L)],
---> 45                                               activation = "sigmoid")
     46         caches = caches.append(AL)
     47 

<ipython-input-9-5c560d8b0806> in linear_activation_forward(A_prev, W, b, activation)
     23         # YOUR CODE STARTS HERE
     24 
---> 25         Z, linear_cache = linear_forward(A_prev,W,b)
     26         A, activation_cache = sigmoid(Z)
     27 

<ipython-input-7-ff417d082cca> in linear_forward(A, W, b)
     18     # Z = ...
     19     # YOUR CODE STARTS HERE
---> 20     Z = np.dot(W,A)+b
     21 
     22     # YOUR CODE ENDS HERE

<__array_function__ internals> in dot(*args, **kwargs)

ValueError: shapes (1,3) and (4,4) not aligned: 3 (dim 1) != 4 (dim 0)

Expected output

AL = [[0.03921668 0.70498921 0.19734387 0.04728177]]

It looks like the A_prev value that you passed to linear_activation_forward for the output layer is not what you are expecting. Please take a careful look at how the variables will stand when you fall out of the “for” loop over the hidden layers.

The other good thing to do here is to first go through the “dimensional analysis”, so that you know what should be happening at each layer. Here’s a thread which takes you through that for this particular test case.

Thank you Paul, I think I am getting close to a solution after doing some dimensional analysis. I spotted an indentation error so my output layer activation was part of the for loop and after bringing it out one level it has helped me reason how this is working. I am getting the right shapes now and even the expected AL output for a given test case with a given set_seed.

However, it looks like when I try to append AL to caches I am unable to because caches is now None type instead of a list. It is still a list by the time I finish the loop. Once the loop assigns AL and caches, is when it becomes None type. How can this happen?

______

for hidden layer number:1
layer 1 A_prev shape: (5, 4)
layer 1 W shape: (4, 5)
layer 1 b shape: (4, 1)
layer 1 A shape: (4, 4)
______

caches is a: <class 'list'>
caches has: 1 items
A is a: <class 'numpy.ndarray'>
______

for hidden layer number:2
layer 2 A_prev shape: (4, 4)
layer 2 W shape: (3, 4)
layer 2 b shape: (3, 1)
layer 2 A shape: (3, 4)
______

caches is a: <class 'list'>
caches has: 2 items
A is a: <class 'numpy.ndarray'>
______
Beginning linear_activation_forward_sigmoid; leaving hidden layers
output layer 3 A_prev shape: (3, 4)
output layer 3 W shape: (1, 3)
output layer b shape: (1, 1)
______


output AL shape: (1, 4)
AL is a <class 'numpy.ndarray'>
AL: [[0.03921668 0.70498921 0.19734387 0.04728177]]
caches is a: <class 'NoneType'>

One way I can think of is that using the “append” method on a list is not an assignment statement, right? If I have a list called myList and want to append a new element, the correct syntax is:

myList.append(newElement)

As mentioned, that is not an assignment statement. myList is an object, which has an append method, which is a function that you are invoking. If you say this, then myList ends up being None:

myList = myList.append(newElement)

That is one mistake I can think of that would cause the syndrome you describe, but I’m sure there are others as well. :nerd_face:

Of course we can see from your nice debugging output that appending to caches looks like it is being handled correctly in the “for” loop. So whatever is causing the issue must happen after you fall out of the loop.

Correct again. I was assigning the variable instead of using the .append method on the the variable.

With that out of the way I am now getting the right shapes when I do a manual run but ‘L_model_forward_test()’ is asserting my shapes and data types are still wrong.

AL = [[0.03921668 0.70498921 0.19734387 0.04728177]]
Error: The function should return a numpy array. in variable 0. Got type: <class 'numpy.ndarray'>  but expected type <class 'tuple'>
Error: The function should return a numpy array. in variable 1. Got type: <class 'numpy.ndarray'>  but expected type <class 'tuple'>
Error: The function should return a numpy array. in variable 1. Got type: <class 'tuple'>  but expected type <class 'list'>
Error: wrong (shape,output) for variable (0,1,2) [my edit for abreviation]
 0  Tests passed
 3  Tests failed

Below is my test:

______

for hidden layer number:1
layer 1 A_prev shape: (5, 4)
layer 1 W shape: (4, 5)
layer 1 b shape: (4, 1)
layer 1 A shape: (4, 4)
______

caches is a: <class 'list'>
caches has: 1 items
A is a: <class 'numpy.ndarray'>
______

for hidden layer number:2
layer 2 A_prev shape: (4, 4)
layer 2 W shape: (3, 4)
layer 2 b shape: (3, 1)
layer 2 A shape: (3, 4)
______

caches is a: <class 'list'>
caches has: 2 items
A is a: <class 'numpy.ndarray'>
______
Beginning linear_activation_forward_sigmoid; leaving hidden layers
output layer 3 A_prev shape: (3, 4)
output layer 3 W shape: (1, 3)
output layer b shape: (1, 1)
______


output AL shape: (1, 4)
AL is a <class 'numpy.ndarray'>
AL: [[0.03921668 0.70498921 0.19734387 0.04728177]]
caches is a: <class 'list'>
caches has: 3 items
returning AL, caches

====================================================
t_AL = [[0.03921668 0.70498921 0.19734387 0.04728177]]
t_caches = 
    [array([[0.        , 3.18040136, 0.4074501 , 0.        ],
       [0.        , 0.        , 3.18141623, 0.        ],
       [4.18500916, 0.        , 0.        , 2.72141638],
       [5.05850802, 0.        , 0.        , 3.82321852]]), 
    array([[ 2.2644603 ,  1.09971298,  0.        ,  1.54036335],
       [ 6.33722569,  0.        ,  0.        ,  4.48582383],
       [10.37508342,  0.        ,  1.63635185,  8.17870169]]), 
    array([[0.03921668, 0.70498921, 0.19734387, 0.04728177]])]

Here is my output with a few debugging messages:

Inner loop l = 1, A.shape (4, 4)
Inner loop l = 2, A.shape (3, 4)
l = 3
A3 = [[0.03921668 0.70498921 0.19734387 0.04728177]]
A3.shape = (1, 4)
AL = [[0.03921668 0.70498921 0.19734387 0.04728177]]

So your AL value is correct. That means the problem is most likely in the cache values. If your previous functions linear_forward and linear_activation_forward are correct and pass their test cases, then there isn’t really anything tricky about how you handle the caches: you just take the second return value from linear_activation_forward and append it to caches for each layer, right?

I can confirm linear_forward() and linear_activation_forward() passed the required tests.

By second return value, do you mean the last final assignment of A for a 3 layer network? This would be the last assignment of A at the end of range(1,L); when L = 3.

I would say that is what I am doing. My test has 3 Layers and on the second and last pass of the for loop, linear_activation_forward() assigns A a new and last value. I am taking that value and using it for my sigmoid pass.

I’m specifically talking about the cache value, not the A value, right? My interpretation of your error message is that the test is failing because of the caches value, not the AL value. linear_activation_forward returns two separate return values, right?

Actually I missed that you had shown your cache value in your output. It looks like it only had one layer’s worth of entries. Oh, actually, it looks like your caches value is just a list of the A^{[l]} values. That’s not what was intended, right?

1 Like

This is the answer. The instructions say caches is a list of the caches produced and I was appending A values to it. What is intended is for the loop to append cache from linear_activation_forward to caches.

1 Like