I am not able to understand why
weight matrix shape of W1 is (n_h, n_x) and
weight matrix shape of W2 is (n_y, n_h)
Is the definition of n_h the parameter count of a given unit in that layer (i think all units in a given layer have same parameter count)
Is the definition of Weights matrix this?
rows = parameter count in that layer
cols = parameters count of previous layer?
why does the cols of Weights matrix of current layer equal to parameters count of previous layer?
I would think cols = count of units in current layer?
As Tom says, a lot of this is just choices that Prof Ng has made. There are different ways to approach this, but he chooses to define the weight matrices that way, so that we can define the “linear” part of forward propagation by this formula:
Z1 = W1 \cdot X + b1
For the first “hidden” layer. The key point is that the operation between W1 and X is a standard “dot product” style matrix multiplication. Think about how that works: the number of columns of the first operand must match the number of rows of the second operand, right? And the shape of X is n_x by m, where n_x is the number of features in each input sample and m is the number of input samples.
So if the shape of W1 is n_h by n_x, then what happens when you dot n_h by n_x with n_x by m? The result will have the shape n_h by m.
Now think about what happens in the output layer: the number of input neurons is n_h and the number of outputs needs to be n_y. The formula is analogous:
Z2 = W2 \cdot A1 + b2
Where A1 is the result of feeding Z1 to the activation function for layer 1. Activation functions operate “elementwise”, meaning that the shape of the output is the same as the shape of the input. So by the same argument about matrix multiplication, the shape of W2 needs to be n_y by n_h.