Exercise 7 - linear_backward

Hi
I am getting this error the exercise 7

ValueError Traceback (most recent call last)
in
1 t_dZ, t_linear_cache = linear_backward_test_case()
----> 2 t_dA_prev, t_dW, t_db = linear_backward(t_dZ, t_linear_cache)
3
4 print("dA_prev: " + str(t_dA_prev))
5 print("dW: " + str(t_dW))

in linear_backward(dZ, cache)
22 # dA_prev = …
23 # YOUR CODE STARTS HERE
—> 24 dW = 1/m * np.multiply(A_prev,dZ)
25 db = 1/m * np.sum(dZ, axis=0, keepdims=True)
26 dA_prev = np.multiply(dZ,W.T)

ValueError: operands could not be broadcast together with shapes (5,4) (3,4)

Hi @Boubacar_Diallo ,

“Trust the error”

ValueError: operands could not be broadcast together with shapes (5,4) (3,4)

When you multiply matrices, you have to make sure that the inner dimensions are the same:

(m,n) x (n,p)

See how both matrices shapes above have ‘n’ dimensions one on the cols and the next on the rows.

The error is telling you that (5,4) x (3,4) will not work.

You need to do something about it. What operation would ‘switch’ the dimensions on one of the matrices?

And there’s another possible issue related to the order of the operands.

Try that and let me know how it goes.

Juan

Now I used the transpose on A_prev and switched the order of the operands. I got almost the same error

23 # YOUR CODE STARTS HERE
—> 24 dW = 1/m * np.multiply(A_prev.T, dZ)
25 db = 1/m * np.sum(dZ, axis=0, keepdims=True)
26 dA_prev = np.multiply(dZ,W.T)

ValueError: operands could not be broadcast together with shapes (4,5) (3,4)

The transpose is applied properly. Try now switching the order of the matrices. I still see first A_prev.

And on second review, there may be also an issue with the function you are using. We may want to do a dot product and not an element-wise product. So that is another thing to fix.

Which numpy function will do a dot product?

Now I use np.dot

It works well

Thank you for your guide :grin:

1 Like

Great! I am glad it works now. But most importantly is that you understand what was going on, and how the lecture of the error guided all the resolution.

Now it is time to movw on with your course.

To your success,

Juan

Here’s a thread from a while back that is useful for explaining Prof Ng’s notation for elementwise versus dot product multiply.

In your example, it’s also important to review what the mathematical formula is telling you. Here’s what it gave in the instructions:

dW^{[l]} = \frac{\partial \mathcal{J} }{\partial W^{[l]}} = \frac{1}{m} dZ^{[l]} A^{[l-1] T}

So based on that thread that I linked, we know what the operation between dZ and A^T is the dot product. And of course dot product is not commutative, so the order is crucial as well. There are some mathematical identities like the following:

(A \cdot B)^T = B^T \cdot A^T

but you can’t just change the order of the operation arbitrarily.