Week 4, Exercise 5 - L_model_forward()

Hi, I am getting the ValueError: shapes are not aligned. Thought, it had already passed all the tests of linear_forward. Here’s the screenshot of the error:


Any help will be appreciated :slight_smile:

3 Likes

hi there, are you sure that first input in linear function is A_prev ?
as far as I remember the first one was “W” and the second one was A_prev and finally b.

1 Like

Yeah…the definition of linear function is linear_forward(A, W, b)…

1 Like

The problem is solved now. During sigmoid activation, I had to send ‘A’ as an argument, not ‘A_prev’.

2 Likes

Which function do you pass it to - ? I’m confused because for linear_activation_forward (which includes linear_forward and activation function), don’t we still pass A_prev?

1 Like

just to save others’ time, try this link you will understand how the for i in range(1,L) loop works.
it took me more than an hour just because of this number :slight_smile: Python Tryit Editor v1.0

2 Likes

Hi,

I’m still quite confused over this part of the assignment. I would appreciate if someone could talk me through this. I’m not sure why “parameters[“W” + str(L)]” was used to call the value of W etc…

Thank you.

1 Like

parameters have column names as “W1”, “W2”, … Hence, parameters[“W” + str(L)] is used.

1 Like

Hi,

Thank you for replying. I was wondering why ’ + str(L)’ was used.

1 Like

I am really confused on why we use a for-loop that goes from 1 to L.

We are building a model that computes the linear->relu from 1 to L-1 and then calculate the Linear->sigmoid out of the loop.

# L-1 iterations of relu
  for l in range(1, L-1):
          A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], "relu")
          caches.append(cache)

# last 
AL, cache = linear_activation_forward(A, parameters['W' + str(L-1)], parameters['b' + str(L-1)], "sigmoid")
    caches.append(cache) 

I tested and when I use a for-loop that iterates from 1 to L, the result is what the test expects, but I wonder if it is correct.
If we make a for-loop from 1 to L, aren’t we making a Linear->relu->linear->sigmoid to the last layer? In fact, we’d end up with a L+1 length caches, don’t we?

2 Likes

I wonder if some of the confusion described in the posts above about looping over L or L-1 or L+1 layers and whether a sigmoid or ReLU activation should be applied at a given layer is due to the slightly non-intuitive nature of the python range command which is used to define the index values for the loop:

for l in range(1, L):

Originally I read this to mean that the range command would produce a series of values (1,2,3). However this didn’t match my understanding of applying ReLU to only the first two layers of the network and not the third layer which uses a sigmoid activation function.

I used the link posted by [Maitha_Shehab_Khanji] above to test out the range command in real time, which really helped me to identify the issues I was having.
https://www.w3schools.com/python/trypython.asp?filename=demo_for_range2

Back to our example where we have three layers (L = 3), ie 2x ReLU layers and than a single sigmoid layer. Using the link above to test out the code we get:

range(3) = 0,1,2
range(1,3) = 1,2

Reviewing the syntax for the range command shows:
https://docs.python.org/3/library/stdtypes.html#range

Syntax: range( *start* , *stop* [, *step* ])

The key thing we find out if we read a bit further down the help page for the range command is that the range command never produces the stop value. The last integer it produces has the value (stop -1) ie 2. And since a start value has been specified, the range command outputs the values 1,2. All good. We can now see that the for loop code will happily loop over just the first two layers in the network, as we would expect.

Hopefully this helps someone else.

2 Likes

I was having the same thought
but the for loop from 1 to L runs from 1 to L-1 and not L
Concretly, if you run this code alone:
for l in range(1, 5)
print (l)
the output is:
1
2
3
4
there is no 5

1 Like

A is also the output variable from the linear activation method, which each iteration of the loop begins with updating A_prev to. Means that after last iteration in the loop A_prev is not updated and you should use the output A directly when calculating AL in the sigmoid section of the code…

1 Like

I’m still stuck on this assignment, I kept getting valueError something that has to do with the shape

1 Like

Hello Aminu Musa,

Welcome to the community.

The inner dimensions (3,4) & (3,4) always need to agree to each other. Please make it correct. Thanks.

1 Like

I got the correct answer to this problem after struggling through and with the help of this thread. However, I think my understanding is still a bit shaky. Hoping someone can clarify for me.

I understand the relu portion. However, my question is on the sigmoid portion of the code.

The sigmoid is applied to the last layer of the neural network right? So that would be level L. Shouldn’t we still be passing in A_prev? or do we use A since it was the jost recent activation created by the loop and it resides in memory? And does ‘parameters’ passed in to the function have W and b for L=1 to L (all layers of the network?). Thanks in advance.

1 Like

Hi @Matt_Samelson ,

The purpose of training a network is to find a set of weights and bias where the cost is at the minimum. You can take a look at the code for L_model_forward() function to see how the network traverse through different layers, and how the weights and bias are used.

From menu bar at the top of the notebook, click:
file->open->dnn_app_utils_v3.py

2 Likes