Each weight matrix connects two adjacent layers.
-
So the first weight matrix has the size of the number of input features and the number of hidden layer units.
-
The second weight matrix has the size of the number of hidden layer units and the number of outputs.