In the video titled “Neural network layer” in week 1, the first layer has 3 units. The input X_vector is [197, 184, 136, 214]. Each of the three units in layer 1 takes this vector as an input and performs a logistic regression (it may also perform a linear regression depending on the activation function specified). The output from each unit should also be a vector of 4 elements since the input X_vector is a vector with 4 elements. This is because each unit is just doing a logistic regression and predicting y (the binary class) based on the input x. Also I realised that W1_1 is a scalar, W1_2 and W2_3 (The weights from first layer) are all scalars because the input X_vector only has one variable.
However, in the video it shows that each unit just outputs a scalar value, so the net output from the first later is A_vector with 3 elements ([0.3,0.7,0.2 ]). How is this possible ?
The way it works is through matrix multiplication. For the first hidden layer with 3 units, W (the matrix of weights) will be a matrix of size 3 \times 4, since the input has four features.
You then do matrix multiplication with W and the input. Since the input has a size of 1 \times 4 (row vector), you rearrange it so that it follows matrix multiplication, i.e., you make it a column vector and then get W, which is 3 \times 4 multiplied by your input vector, in this case 4 \times 1 and then get your Z (a 3 \times 1 vector), which you pass to the activation function and get the final output of size 3 \times 1.
In a way, neural networks is just matrix multiplication and that is why Andrew took his time to make sure we understood the matrix dimension and multiplication rules in the first Deep Learning Specialization course.
So for example first column of the W matrix will have W1_1, W1_2 and W1_3. And then the remaining 3 columns will be a copy of this column ? Because there is only one W value for one unit of the first hidden layer.
Also, if the output from the first unit of the first hidden layer is a 3 X 1 vector (which makes sense), why does Andrew in the vide say that this output will be a scalar (whose value in the given example is 0.3) ?
Regarding your second question, each unit in a hidden layer will have a scalar output. That’s correct. However,the full hidden layer will have the 1 \times 3 vector output.
You are partially correct about the weight matrix, W, it has three rows (number of units in the layer) and four columns(number of units in the previous layer, in this case the input layer), making 12 weights overall: W = \begin{bmatrix}
w_{11}^{[1]} & w_{12}^{[1]} & w_{13}^{[1]} & w_{14}^{[1]}\\
w_{21}^{[1]} & w_{22}^{[1]} & w_{23}^{[1]} & w_{24}^{[1]}\\
w_{31}^{[1]} & w_{32}^{[1]}& w_{33}^{[1]} & w_{34}^{[1]}
\end{bmatrix}
The superscript denotes the hidden layer(in this case first hidden layer). It can be represented in different ways depending on preference.
Overall, there are 15 trainable parameters, including a bias for each of the 3 units: \vec{\mathbf{b}}= [b_{1}^{[1]},b_{2}^{[1]},b_{3}^{[1]}]
So, the full equation is: \vec{\mathbf{Z}} = \mathbf{W} \cdot \vec{\mathbf{x}} + \vec{\mathbf{b}}
I did not add the bias vector initially because it does not have any impact on the final shape. Z and A have the same shape. You just pass Z to the sigmoid function or any other activation you use to get the output, A.
In the hidden and output layers, each unit outputs a scalar value. It’s the weighted sum of the product of all of the previous layer units with their weights, with some non-linear activation function applied in the hidden layer.
The output layer may or may not have a non-linear activation, depending on whether the output is a logical value or a linear value.
If each unit of each layer outputs a scalar value, the the final layer (which has only one unit) should also output a scalar value (a number between 0 and 1 in case of a logistic activation function). However, since my X vector has 4 elements, I would expect a prediction for all four values of X. Hence I would expect the final output to also be a vector with 4 elements. But in the video we are getting only one scalar as the final output. Please let me know what am I missing here ?
Thank you for putting in time to write a detailed answer. However I still do not understand. For example in the practice lab we had a layer with 3 units. In that case we only specified three W values and not the matrix like you mentioned. That is, we specified one value of W for each unit. I’m still not sure what am I missing here.