Week 4, Exercise 5 - L_model_forward()

Hi, I am getting the ValueError: shapes are not aligned. Thought, it had already passed all the tests of linear_forward. Here’s the screenshot of the error:


Any help will be appreciated :slight_smile:

3 Likes

hi there, are you sure that first input in linear function is A_prev ?
as far as I remember the first one was “W” and the second one was A_prev and finally b.

1 Like

Yeah…the definition of linear function is linear_forward(A, W, b)…

1 Like

The problem is solved now. During sigmoid activation, I had to send ‘A’ as an argument, not ‘A_prev’.

2 Likes

Which function do you pass it to - ? I’m confused because for linear_activation_forward (which includes linear_forward and activation function), don’t we still pass A_prev?

1 Like

just to save others’ time, try this link you will understand how the for i in range(1,L) loop works.
it took me more than an hour just because of this number :slight_smile: Python Tryit Editor v1.0

2 Likes

Hi,

I’m still quite confused over this part of the assignment. I would appreciate if someone could talk me through this. I’m not sure why “parameters[“W” + str(L)]” was used to call the value of W etc…

Thank you.

1 Like

parameters have column names as “W1”, “W2”, … Hence, parameters[“W” + str(L)] is used.

1 Like

Hi,

Thank you for replying. I was wondering why ’ + str(L)’ was used.

1 Like

I am really confused on why we use a for-loop that goes from 1 to L.

We are building a model that computes the linear->relu from 1 to L-1 and then calculate the Linear->sigmoid out of the loop.

# L-1 iterations of relu
  for l in range(1, L-1):
          A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], "relu")
          caches.append(cache)

# last 
AL, cache = linear_activation_forward(A, parameters['W' + str(L-1)], parameters['b' + str(L-1)], "sigmoid")
    caches.append(cache) 

I tested and when I use a for-loop that iterates from 1 to L, the result is what the test expects, but I wonder if it is correct.
If we make a for-loop from 1 to L, aren’t we making a Linear->relu->linear->sigmoid to the last layer? In fact, we’d end up with a L+1 length caches, don’t we?

2 Likes

I wonder if some of the confusion described in the posts above about looping over L or L-1 or L+1 layers and whether a sigmoid or ReLU activation should be applied at a given layer is due to the slightly non-intuitive nature of the python range command which is used to define the index values for the loop:

for l in range(1, L):

Originally I read this to mean that the range command would produce a series of values (1,2,3). However this didn’t match my understanding of applying ReLU to only the first two layers of the network and not the third layer which uses a sigmoid activation function.

I used the link posted by [Maitha_Shehab_Khanji] above to test out the range command in real time, which really helped me to identify the issues I was having.
https://www.w3schools.com/python/trypython.asp?filename=demo_for_range2

Back to our example where we have three layers (L = 3), ie 2x ReLU layers and than a single sigmoid layer. Using the link above to test out the code we get:

range(3) = 0,1,2
range(1,3) = 1,2

Reviewing the syntax for the range command shows:
https://docs.python.org/3/library/stdtypes.html#range

Syntax: range( *start* , *stop* [, *step* ])

The key thing we find out if we read a bit further down the help page for the range command is that the range command never produces the stop value. The last integer it produces has the value (stop -1) ie 2. And since a start value has been specified, the range command outputs the values 1,2. All good. We can now see that the for loop code will happily loop over just the first two layers in the network, as we would expect.

Hopefully this helps someone else.

2 Likes

I was having the same thought
but the for loop from 1 to L runs from 1 to L-1 and not L
Concretly, if you run this code alone:
for l in range(1, 5)
print (l)
the output is:
1
2
3
4
there is no 5

1 Like

A is also the output variable from the linear activation method, which each iteration of the loop begins with updating A_prev to. Means that after last iteration in the loop A_prev is not updated and you should use the output A directly when calculating AL in the sigmoid section of the code…

1 Like

I’m still stuck on this assignment, I kept getting valueError something that has to do with the shape

1 Like

Hello Aminu Musa,

Welcome to the community.

The inner dimensions (3,4) & (3,4) always need to agree to each other. Please make it correct. Thanks.

1 Like

I got the correct answer to this problem after struggling through and with the help of this thread. However, I think my understanding is still a bit shaky. Hoping someone can clarify for me.

I understand the relu portion. However, my question is on the sigmoid portion of the code.

The sigmoid is applied to the last layer of the neural network right? So that would be level L. Shouldn’t we still be passing in A_prev? or do we use A since it was the jost recent activation created by the loop and it resides in memory? And does ‘parameters’ passed in to the function have W and b for L=1 to L (all layers of the network?). Thanks in advance.

1 Like

Hi @Matt_Samelson ,

The purpose of training a network is to find a set of weights and bias where the cost is at the minimum. You can take a look at the code for L_model_forward() function to see how the network traverse through different layers, and how the weights and bias are used.

From menu bar at the top of the notebook, click:
file->open->dnn_app_utils_v3.py

2 Likes

Assignment 4 Exercise 5 Walkthrough

Overview

When the function is called:

Caches: Stores a list of caches from linear_activation_forward().

A: Holds the input data X.

L: Represents the number of layers in the neural network. The length of parameters is divided by two because it contains weights and biases for each layer.

For Loop Explanation

The for loop runs from 1 to 𝐿−1

The output layer is not included because it uses a different activation function (sigmoid) compared to the ReLU activation function used in other layers.

First Iteration

A_prev is assigned the value of A, which holds the input data X.

linear_activation_forward is called with A_prev and other parameters to compute the linear and ReLU activation for the first layer.

The returned A and cache are stored in their respective variables.

The cache is appended to the list of caches.

Subsequent Iterations

A from the previous layer is assigned to A_prev.

linear_activation_forward is called with A_prev and other parameters to compute the layer’s activations.

The returned A and cache are stored in their respective variables.

These steps repeat until the last-but-one layer is executed.

Output Layer

The for loop does not handle the output layer since it requires a sigmoid activation function.

For the output layer, the activations from the last hidden layer (stored in A) are used directly.

linear_activation_forward is called with A instead of A_prev.

1 Like

have to define it before we call it later, but I got my issue with this assignment as well

for anyone who encounters the same problem :
make sure that you write the LINEAR of sigmoid outside the for loop and apply it just for the last layer with A not A_prev

1 Like