In the week 2 exercise, X has shape (12288,209) and as A,Y are row matrix there shape is (1,209). so shape of (A-Y).T is (209,1). Matrix multiplication of (12288,209) and (209,1) will give a matrix of size (12288,1).
So basically, np.dot(X,(A-Y).T) has shape (12288,1). We are dividing this by m which is 209. how does it make sense? I am also facing error about dw shape.
The dividing by 209 is just because we are averaging the gradients over the gradients on the individual samples. The gradient of the average is the average of the gradients, right? But that doesn’t involve any shape issues since \frac {1}{209} is a scalar.
So what is the shape issue you are getting? Please show us the actual error trace. Without seeing anything, my guess is that you’ve made one of the common mistakes of using global variables or assuming that the dimensions of all inputs are the same. The test cases here may very well not use 12288 features, for example. We always strive to write general code that works with any sizes of inputs.
Your understanding of the matrix shapes is correct. What is the error related to the shape of dw
?
If you get an error related to the shape of dw
, it may be because the test case code expects dw
to be a column vector with a shape (2, 1), assert grads["dw"].shape == (2, 1)
, instead of dw
being a vector (2,) (a vector of gradients for each weight).