Why does W for a single neuron become a vector with 1 row and 2 columns when previously we have been using just scalar quantities? When does W need to be a vector and how do we determine the dimensions of the vector and the values in the vector?
Every single neuron for example w1 called vector number of rows is equal input of the neurons in the previous layer(input columns) as here in w1 has 2 row like that the number of neurons in the previous layer (columns in a_in(X)) and number of columns is number of training examples(batch size) of the previous layer here is 1 (it 's called broadcasting)as the input here is 1 training examples that’s for matrix multiplication as x shape is (1,2) and w1 shape is(2,1) so that the matrix multiplication rules can be applied … if we concatenated the neurons with each other it’s became an matrix called weighted matrix (W) with number of rows is equal number number of input(columns) in the previous layer and number of columns in weighted matrix (W) is equal number of neurons in the this layer here weighted matrix (W) is (2,3) 2 rows equal number of inputs(previous layer) and 3 columns is number of neurons in this layer
I am just wondering you said
W for a single neuron become a vector with 1 row and 2 columns
but there is no such vector with 1 row & 2 columns on the slide you shared. There are weight vectors w_1, w_2 and w_3 which all have 2 rows & 1 column, and there is a W matrix that has 2 rows and 3 columns (concatenating the three weight vectors as described by @AbdElRhaman_Fakhry). I guess you are asking about the weight vectors that have 2 rows & 1 column?
As @AbdElRhaman_Fakhry explained, the 3 weight vectors represents that we have 3 neurons in the layer, and that each weight vector has 2 elements represents that we have 2 features from each incoming sample. So just to emphasize on the following part of your question
We have a 1-element weight vector when the incoming sample has only one feature. We have only one weight vector when the layer has only one neuron. Combining the previous 2 statements, we have only a scalar weight when we have only one neuron and when the incoming sample has only one feature.
In conclusion, we effectively choose the shape of W by deciding how many neurons we want in a layer and how many features we are giving into the layer.