I’m watching the videos on week 3 and I’m having trouble to understand this:
if the input layer has x1, x2 and x3, how come the first hidden layer has w1, w2, w3 and w4?
What am I missing?
I was expecting something like:
z = x1w1 + x2w2 + x3*w3 + b
Where is this w4 coming from?
I don’t watch these videos but i think it is bias. you can understand better from here :
There is no necessary relationship between the number of input features and the number of neurons in any particular layer. Either number can be bigger. Generally speaking in a feed forward fully connected network (that’s the type Prof Ng is showing us here), the number of output neurons usually starts as a higher number and then decreases as you go through the layers, culminating in a single output neuron at the output layer that gives the “yes/no” answer, if the network is doing “binary” classifications. The way to think about this conceptually is that the earlier layers in the network learn “low level” features like edges or curves or textures and there could be lots of different such features in an image (assuming that is our canonical example), then as you go through the network, the later layers are “distilling” that large number of low level features into high level recognitions: e.g. two edges that meet at an angle between 30 and 60 degrees might be the tip of a cat’s ear. Then at the final layer, it is has to “put everything together” and predict whether it’s a cat or not.
Prof Ng is just giving an example with low numbers of features and neurons here just to make things easier to write and explain. But everything he is showing is completely “general” and works with any number of features or neurons.
If I’m not mistaken, the b constant is accounting for bias, right?
Yes, b is the bias value, which is a column vector at each layer. Sorry, maybe I didn’t read your initial question carefully enough. I was answering a different more general question. At the first hidden layer, if we have 3 input features and 4 output neurons, then the shape of W^{[1]} will be 4 x 3. You’re right that the linear combination with the input features needs the same number of coefficients as there are features.
If you’re talking about this area of the lecture:
Note that the
w_i are vectors and
x is a vector with 3 elements. So each
w_i vector also has three elements. There are 4 of them because there are 4 output neurons from this layer. They end up being the rows of the “weights” matrix, which you can see starting to take shape at the bottom of the picture there.