Shouldn’t W[1] have the shape (4,3), and W[1].T have the shape (3,4)?
The reasoning behind this is that we have w as a column vector and then we stack multiple columns for multiple neurons. Also, here X have the shape (4,m) and to get the dot product of W and X, we need W[1].T . X [ (3,4) . (4,m) ].
In this specialization, W shape is (number of neurons of the current layer, number of neurons of the previous layer or input features). So, how many neurons and features we have in this example?
Based on what you’ve said, W[1] shape should be (3,4). But that doesn’t make sense because when we take the dot product of W[1].T and X, the shapes do not follow the rules of matrix multiplication.
@saifkhanengr, what @Ammar_Jawed is talking about is taking transpose when calculating/writting code.
@Ammar_Jawed, You have to understand the shape of W[k] is (neurons in the current layer, previous input ). When you say W[k].T, you are talking about taking a transpose of W[k] , so essentially W[k] is still the same.
Another thing to note here is, the question is not about taking a dot product or how calculations are done, it is simply about the shapes of W and b
Never mind. As Prof. Andrew said, to become good at anything, the first step is to suc* at it. If you’ve succeeded at suc*ing at AI – congratulations, you’re on your way!