I was taking notes for the W2, Gradient Descent for Neural networks lecture and I’m still not able to understand how the matrix multiplication would result in a (n[1],m) matrix. Could someone guide me with this?
Well, what are the dimensions of the objects in question?
W^{[2]} is n^{[2]} x n^{[1]}
and
dZ^{[2]} is n^{[2]} x m
So when you do the transpose, the dot product is:
n^{[1]} x n^{[2]} dotted with n^{[2]} x m
Which has the result n^{[1]} x m, right?
Of course you then do the elementwise multiply with g^{[1]'}(Z^{[1]}), but that works because the latter has the same shape as Z^{[1]}.
1 Like