Hello professionals,
I may need some extra help with DLS C1 W4 programming assignment. It’s about creating a L-layer model for deep neural network. I can’t seem to find out what’s the exact error.
Did i miss a transpose somewhere? Also, can you be kind enough to teach me how to generally debug an error?
Yes as you say the issue is probably with transposing, in your case Z = np.dot(W, A) + b
and it tells you the shapes for multiplication are not right.
Generally speaking 2 matrices can be multiplied if their shapes are mxn * nxk, i.e. the number or columns in the first matrix is equal to the number of rows in the second matrix.
Normally the code error will give you some insight on the problem but you might have to dig deeper to find the cause, this is the approach!
My guess is that the problem has nothing to do with transposes. I think the right question to ask is “what is the value of l
(lower case ell) after you fall out of the for loop over the hidden layers?” It’s not what you are assuming.
Try this and watch what happens:
for ii in range(1,4):
print(f"ii = {ii}")
print(f"After loop ii = {ii}")
The best way to approach debugging a situation like this is to look at the “dimensional analysis”. Here’s a thread that walks you through what should be happening in this exercise.
Thank you!
Yes, I found out that i’m dealing with the Lth layer, which isn’t inside the loop for 1~L-1 layer. Reading the thread was indeed very helpful. Much appreciated!
Hello,
Hi! I am trying to implement the linear_backward function on this assignment, but am running into a broadcasting error when calculating dW. Any help would be appreciated. Thank you
I added some print statements to the linear_backward
code to show the shapes of the inputs and here’s what I see:
dZ.shape (3, 4)
A_prev.shape (5, 4)
W.shape (3, 5)
dA_prev: [[-1.15171336 0.06718465 -0.3204696 2.09812712]
[ 0.60345879 -3.72508701 5.81700741 -3.84326836]
[-0.4319552 -1.30987417 1.72354705 0.05070578]
[-0.38981415 0.60811244 -1.25938424 1.47191593]
[-2.52214926 2.67882552 -0.67947465 1.48119548]]
dW: [[ 0.07313866 -0.0976715 -0.87585828 0.73763362 0.00785716]
[ 0.85508818 0.37530413 -0.59912655 0.71278189 -0.58931808]
[ 0.97913304 -0.24376494 -0.08839671 0.55151192 -0.10290907]]
db: [[-0.14713786]
[-0.11313155]
[-0.13209101]]
dZ.shape (3, 4)
A_prev.shape (5, 4)
W.shape (3, 5)
dZ.shape (3, 4)
A_prev.shape (5, 4)
W.shape (3, 5)
dZ.shape (3, 4)
A_prev.shape (5, 4)
W.shape (3, 5)
All tests passed.
So it looks like you did the transpose on the A_prev
value, but the clue to the mistake is that there is no broadcasting involving in np.dot
. My bet is that you used * or np.multiply
as the operation there. The key point here is that you need to understand the notational convention that Prof Ng uses in mathematical expressions: if he means elementwise multiply, he will always explicitly use * as the operator. But if he writes the operands adjacent to each other with no explicit operator, then he means “real” matrix multiply (dot product style). You can see from the formulas that Saif shows that we need np.dot
in the case of calculating dW.
The point about the notational conventions is discussed in more detail on this thread.