Building_your_Deep_Neural_Network_Step_by_Step error debugging

I added some print statements to the linear_backward code to show the shapes of the inputs and here’s what I see:

dZ.shape (3, 4)
A_prev.shape (5, 4)
W.shape (3, 5)
dA_prev: [[-1.15171336  0.06718465 -0.3204696   2.09812712]
 [ 0.60345879 -3.72508701  5.81700741 -3.84326836]
 [-0.4319552  -1.30987417  1.72354705  0.05070578]
 [-0.38981415  0.60811244 -1.25938424  1.47191593]
 [-2.52214926  2.67882552 -0.67947465  1.48119548]]
dW: [[ 0.07313866 -0.0976715  -0.87585828  0.73763362  0.00785716]
 [ 0.85508818  0.37530413 -0.59912655  0.71278189 -0.58931808]
 [ 0.97913304 -0.24376494 -0.08839671  0.55151192 -0.10290907]]
db: [[-0.14713786]
 [-0.11313155]
 [-0.13209101]]
dZ.shape (3, 4)
A_prev.shape (5, 4)
W.shape (3, 5)
dZ.shape (3, 4)
A_prev.shape (5, 4)
W.shape (3, 5)
dZ.shape (3, 4)
A_prev.shape (5, 4)
W.shape (3, 5)
 All tests passed.

So it looks like you did the transpose on the A_prev value, but the clue to the mistake is that there is no broadcasting involving in np.dot. My bet is that you used * or np.multiply as the operation there. The key point here is that you need to understand the notational convention that Prof Ng uses in mathematical expressions: if he means elementwise multiply, he will always explicitly use * as the operator. But if he writes the operands adjacent to each other with no explicit operator, then he means “real” matrix multiply (dot product style). You can see from the formulas that Saif shows that we need np.dot in the case of calculating dW.

The point about the notational conventions is discussed in more detail on this thread.

1 Like