Hi,
Can someone kindly explain, why in the attached image from the lecture, W[1] has the size of a (4,3) and a[1] the size of (3,1) but W[2] has the size of (1,4) and a[2] is (4,1)?
If I understand correctly, W[1] represents 4 nodes, and each node corresponds to 3 input features, that is giving it the shape of (4,3).
Based on the same logic, W[2] represents only one node, but how do we arrive at (1,4)? what happened to the 3 input features? At layer 2, do we count nodes from layer 1 instead of features?
As you can see, the network presented here has 2 hidden layers, layer1 and layer 2. The way the weights matrix is organised is based on the nodes of the current layer to the number of nodes in the previous layer. We count the input layer as layer 0, so for layer1, w^{[1]} has the shape of (4,3), as layer 1 has 4 nodes, layer 0 has 3 features. For layer 2, there is just one node, whilst layer 1 has 4 nodes, for w^{[2]} has the shape of (1,4)
As layer 2 is taking the output of layer 1, so the features have already got moved into the network.