when writing vectorized form for dw, can we rewrite this as dw= (1/m) * np.sum(np.dot(X,dz.T))???
Sorry, but that is not equivalent to the formula as written. Note that your formula will produce a scalar value. We need dw to be a vector with the same dimensions as w.
Also note that matrix multiplication involves multiplication followed by addition, so there is an implied addition in the np.dot
.
You can also gain some insight by doing the “dimensional analysis”. What are the shapes of the objects involved? We have:
X is n_x x m where n_x is the number of features and m is the number of samples.
Z is 1 x m, so that means that dZ is also 1 x m, but you can also see that because both A and Y are 1 x m.
So what will happen when we compute X \cdot dZ^T? It will be n_x x m dotted with m x 1 (because of the transpose), so the result will be n_x x 1. That is correct, since w is n_x x 1.
If we write out the formula in python/numpy, it is:
dw = 1./m * np.dot(X, dZ.T)
If you then take the sum of that, you’ll end up with a scalar value, which is not what we want.