Course 1 Week 3 A1 Excercise 6 Dw2 operands could not be broadcast

I’m getting this error when trying to calculate the value of dW2.

ValueError                                Traceback (most recent call last)
<ipython-input-84-a06d396e2b09> in <module>
      1 parameters, cache, t_X, t_Y = backward_propagation_test_case()
----> 3 grads = backward_propagation(parameters, cache, t_X, t_Y)
      4 print ("dW1 = "+ str(grads["dW1"]))
      5 print ("db1 = "+ str(grads["db1"]))

<ipython-input-83-32e5e48c7f51> in backward_propagation(parameters, cache, X, Y)
     48     dZ2 = A2 - Y
---> 49     dW2 = (1/m) * dZ2 * A1.T
     50     db2 = (1/m) * np.sum(dZ2, axis=1, keepdims=True)
     51     dZ1 = W2.T * (dZ2) * (1 - np.power(A1, 2))

ValueError: operands could not be broadcast together with shapes (1,3) (3,4)

The formula for which is:
Screenshot 2022-07-01 at 17.12.15

Note all the following assumes I’m doing 1/m * XXXXXXX e.g. (1/m) * DZ2 * A1

Writing the calculation out as DZ2* transpose of A1 will give me an error. Running this as DZ2 * a1 and transpose the result will work, but it’ll be the wrong shape…

Which it’ll be

dW2 = [[ 0.00102716  0.00870451  0.00334799 -0.00358614]
 [ 0.00034367  0.00454148  0.00061431 -0.00230807]
 [-0.00058242  0.0044083  -0.00480396 -0.00433106]]

Where it should be

dW2 = [[ 0.00078841  0.01765429 -0.00084166 -0.01022527]]

This is wrong, but I don’t know how to change the code without deviating drastically from the formula which is dW[2] = 1/m*dz[2]*a[1].T

I don’t believe it’s my calculation of dZ2 as that is simply A2-Y

My labID is hqifnokw if you wanted to look.

I’ve solved this but I don’t understand why it works.

Doing a dot product of a transposed matrix sounds like it would do a multiply of of both matrixes but both transposed, so they’d cancel each other out.

Further I think about it, there’s a sum all in a .dot, so is it adding the values up along the 2(y?) axis up so that it becomes a 1,4 matrix?

dW^{[2]} = dZ^{[2]}{A^{[1]}}^T is a “dot product” of dZ^{[2]} and {A^{[1]}}^T. Your implementation in the above is “Hadamard product” which is an element-wise multiplication, that both matrix sizes need to be identical, or broadcastable.

In this particular test case, dZ^{[2]}.shape = (1,3) and A^{[1]}.shape = (4,3).
So,dZ^{[2]} is broadcastable, which sometimes makes a learner write a wrong code.
If you calculate dZ^{[2]}*A^{[1]} (or {dZ^{[2]}}^{T}*{A^{[1]}}^{T}), dZ^{[2]} is broadcasted and becomes (4,3). Then, output will be (4,3) (or (3,4) if you transpose both). As you are aware of, this is not an expected value.
If we take a dot product of dZ^{[2]} and {A^{[1]}}^T, i.e, (1,3) and (3,4), then, output is (1,4) as you wrote. This is what we are requested.