Why 4 units in the hidden layer if we have 3 input features?

erickfis · January 26, 2022, 12:16pm

I’m watching the videos on week 3 and I’m having trouble to understand this:

if the input layer has x1, x2 and x3, how come the first hidden layer has w1, w2, w3 and w4?

What am I missing?

I was expecting something like:
z = x1w1 + x2w2 + x3*w3 + b

Where is this w4 coming from?

Ubeydullah_Onder · January 26, 2022, 12:19pm

I don’t watch these videos but i think it is bias. you can understand better from here :

paulinpaloalto · January 26, 2022, 4:03pm

There is no necessary relationship between the number of input features and the number of neurons in any particular layer. Either number can be bigger. Generally speaking in a feed forward fully connected network (that’s the type Prof Ng is showing us here), the number of output neurons usually starts as a higher number and then decreases as you go through the layers, culminating in a single output neuron at the output layer that gives the “yes/no” answer, if the network is doing “binary” classifications. The way to think about this conceptually is that the earlier layers in the network learn “low level” features like edges or curves or textures and there could be lots of different such features in an image (assuming that is our canonical example), then as you go through the network, the later layers are “distilling” that large number of low level features into high level recognitions: e.g. two edges that meet at an angle between 30 and 60 degrees might be the tip of a cat’s ear. Then at the final layer, it is has to “put everything together” and predict whether it’s a cat or not.

Prof Ng is just giving an example with low numbers of features and neurons here just to make things easier to write and explain. But everything he is showing is completely “general” and works with any number of features or neurons.

erickfis · January 26, 2022, 4:56pm

If I’m not mistaken, the b constant is accounting for bias, right?

paulinpaloalto · January 26, 2022, 5:02pm

Yes, b is the bias value, which is a column vector at each layer. Sorry, maybe I didn’t read your initial question carefully enough. I was answering a different more general question. At the first hidden layer, if we have 3 input features and 4 output neurons, then the shape of W^{[1]} will be 4 x 3. You’re right that the linear combination with the input features needs the same number of coefficients as there are features.

If you’re talking about this area of the lecture:

Note that the w_i are vectors and x is a vector with 3 elements. So each w_i vector also has three elements. There are 4 of them because there are 4 output neurons from this layer. They end up being the rows of the “weights” matrix, which you can see starting to take shape at the bottom of the picture there.

Topic		Replies	Views
C2_W1_Lab02_CoffeeRoasting Advanced Learning Algorithms week-1	3	508	February 7, 2023
W3_Computing a Neural Network's Output Neural Networks and Deep Learning	1	488	April 10, 2023
Calculation the number of bias parameter Convolutional Neural Networks	4	514	April 16, 2024
Question regarding week 3 video 3 "computing a nn's output" Neural Networks and Deep Learning week-3	2	17	October 21, 2024
Why is 1st Hidden Layer of Size 4 when Input Layer is of Size 3? Does this apply to a Generalized NN with n-Feature Inputs or is this just a specific example? Neural Networks and Deep Learning	1	575	July 27, 2021

Why 4 units in the hidden layer if we have 3 input features?

Related topics