I added some print statements to the linear_backward
code to show the shapes of the inputs and here’s what I see:
dZ.shape (3, 4)
A_prev.shape (5, 4)
W.shape (3, 5)
dA_prev: [[-1.15171336 0.06718465 -0.3204696 2.09812712]
[ 0.60345879 -3.72508701 5.81700741 -3.84326836]
[-0.4319552 -1.30987417 1.72354705 0.05070578]
[-0.38981415 0.60811244 -1.25938424 1.47191593]
[-2.52214926 2.67882552 -0.67947465 1.48119548]]
dW: [[ 0.07313866 -0.0976715 -0.87585828 0.73763362 0.00785716]
[ 0.85508818 0.37530413 -0.59912655 0.71278189 -0.58931808]
[ 0.97913304 -0.24376494 -0.08839671 0.55151192 -0.10290907]]
db: [[-0.14713786]
[-0.11313155]
[-0.13209101]]
dZ.shape (3, 4)
A_prev.shape (5, 4)
W.shape (3, 5)
dZ.shape (3, 4)
A_prev.shape (5, 4)
W.shape (3, 5)
dZ.shape (3, 4)
A_prev.shape (5, 4)
W.shape (3, 5)
All tests passed.
So it looks like you did the transpose on the A_prev
value, but the clue to the mistake is that there is no broadcasting involving in np.dot
. My bet is that you used * or np.multiply
as the operation there. The key point here is that you need to understand the notational convention that Prof Ng uses in mathematical expressions: if he means elementwise multiply, he will always explicitly use * as the operator. But if he writes the operands adjacent to each other with no explicit operator, then he means “real” matrix multiply (dot product style). You can see from the formulas that Saif shows that we need np.dot
in the case of calculating dW.
The point about the notational conventions is discussed in more detail on this thread.