Hi, I am getting the ValueError: shapes are not aligned. Thought, it had already passed all the tests of linear_forward. Here’s the screenshot of the error:
Any help will be appreciated

Hi, I am getting the ValueError: shapes are not aligned. Thought, it had already passed all the tests of linear_forward. Here’s the screenshot of the error:
hi there, are you sure that first input in linear function is A_prev ?
as far as I remember the first one was “W” and the second one was A_prev and finally b.
Yeah…the definition of linear function is linear_forward(A, W, b)…
The problem is solved now. During sigmoid activation, I had to send ‘A’ as an argument, not ‘A_prev’.
Which function do you pass it to - ? I’m confused because for linear_activation_forward (which includes linear_forward and activation function), don’t we still pass A_prev?
just to save others’ time, try this link you will understand how the for i in range(1,L) loop works.
it took me more than an hour just because of this number Python Tryit Editor v1.0
Hi,
I’m still quite confused over this part of the assignment. I would appreciate if someone could talk me through this. I’m not sure why “parameters[“W” + str(L)]” was used to call the value of W etc…
Thank you.
parameters have column names as “W1”, “W2”, … Hence, parameters[“W” + str(L)] is used.
Hi,
Thank you for replying. I was wondering why ’ + str(L)’ was used.
I am really confused on why we use a for-loop that goes from 1 to L.
We are building a model that computes the linear->relu from 1 to L-1 and then calculate the Linear->sigmoid out of the loop.
# L-1 iterations of relu
for l in range(1, L-1):
A, cache = linear_activation_forward(A_prev, parameters['W' + str(l)], parameters['b' + str(l)], "relu")
caches.append(cache)
# last
AL, cache = linear_activation_forward(A, parameters['W' + str(L-1)], parameters['b' + str(L-1)], "sigmoid")
caches.append(cache)
I tested and when I use a for-loop that iterates from 1 to L, the result is what the test expects, but I wonder if it is correct.
If we make a for-loop from 1 to L, aren’t we making a Linear->relu->linear->sigmoid to the last layer? In fact, we’d end up with a L+1 length caches, don’t we?
I wonder if some of the confusion described in the posts above about looping over L
or L-1
or L+1
layers and whether a sigmoid or ReLU activation should be applied at a given layer is due to the slightly non-intuitive nature of the python range
command which is used to define the index values for the loop:
for l in range(1, L):
Originally I read this to mean that the range
command would produce a series of values (1,2,3). However this didn’t match my understanding of applying ReLU to only the first two layers of the network and not the third layer which uses a sigmoid activation function.
I used the link posted by [Maitha_Shehab_Khanji] above to test out the range
command in real time, which really helped me to identify the issues I was having.
https://www.w3schools.com/python/trypython.asp?filename=demo_for_range2
Back to our example where we have three layers (L = 3), ie 2x ReLU layers and than a single sigmoid layer. Using the link above to test out the code we get:
range(3) = 0,1,2
range(1,3) = 1,2
Reviewing the syntax for the range command shows:
https://docs.python.org/3/library/stdtypes.html#range
Syntax: range( *start* , *stop* [, *step* ])
The key thing we find out if we read a bit further down the help page for the range
command is that the range
command never produces the stop value. The last integer it produces has the value (stop -1) ie 2. And since a start value has been specified, the range
command outputs the values 1,2. All good. We can now see that the for
loop code will happily loop over just the first two layers in the network, as we would expect.
Hopefully this helps someone else.
I was having the same thought
but the for loop from 1 to L runs from 1 to L-1 and not L
Concretly, if you run this code alone:
for l in range(1, 5)
print (l)
the output is:
1
2
3
4
there is no 5
A is also the output variable from the linear activation method, which each iteration of the loop begins with updating A_prev to. Means that after last iteration in the loop A_prev is not updated and you should use the output A directly when calculating AL in the sigmoid section of the code…
I’m still stuck on this assignment, I kept getting valueError something that has to do with the shape
Hello Aminu Musa,
Welcome to the community.
The inner dimensions (3,4) & (3,4) always need to agree to each other. Please make it correct. Thanks.
I got the correct answer to this problem after struggling through and with the help of this thread. However, I think my understanding is still a bit shaky. Hoping someone can clarify for me.
I understand the relu portion. However, my question is on the sigmoid portion of the code.
The sigmoid is applied to the last layer of the neural network right? So that would be level L. Shouldn’t we still be passing in A_prev? or do we use A since it was the jost recent activation created by the loop and it resides in memory? And does ‘parameters’ passed in to the function have W and b for L=1 to L (all layers of the network?). Thanks in advance.
Hi @Matt_Samelson ,
The purpose of training a network is to find a set of weights and bias where the cost is at the minimum. You can take a look at the code for L_model_forward() function to see how the network traverse through different layers, and how the weights and bias are used.
From menu bar at the top of the notebook, click:
file->open->dnn_app_utils_v3.py
Assignment 4 Exercise 5 Walkthrough
Overview
When the function is called:
Caches: Stores a list of caches from linear_activation_forward().
A: Holds the input data X.
L: Represents the number of layers in the neural network. The length of parameters is divided by two because it contains weights and biases for each layer.
For Loop Explanation
The for loop runs from 1 to 𝐿−1
The output layer is not included because it uses a different activation function (sigmoid) compared to the ReLU activation function used in other layers.
First Iteration
A_prev is assigned the value of A, which holds the input data X.
linear_activation_forward is called with A_prev and other parameters to compute the linear and ReLU activation for the first layer.
The returned A and cache are stored in their respective variables.
The cache is appended to the list of caches.
Subsequent Iterations
A from the previous layer is assigned to A_prev.
linear_activation_forward is called with A_prev and other parameters to compute the layer’s activations.
The returned A and cache are stored in their respective variables.
These steps repeat until the last-but-one layer is executed.
Output Layer
The for loop does not handle the output layer since it requires a sigmoid activation function.
For the output layer, the activations from the last hidden layer (stored in A) are used directly.
linear_activation_forward is called with A instead of A_prev.
have to define it before we call it later, but I got my issue with this assignment as well
for anyone who encounters the same problem :
make sure that you write the LINEAR of sigmoid outside the for loop and apply it just for the last layer with A not A_prev