Week 3: matrix W

From my understanding, the input layer takes 3 features of an individual from X, so the input layer is merely one individual. These feature values are then put into a linear regression model with wx+b in layer 1, and then thru the sigmoid function into a logistic regression model. What I don’t understand is: if the first node of layer 1 already has values of vector w and it calculated the prediction for training example x given vector w (i.e., y hat), then why is there a need for the other 3 nodes in layer 1?–we already calculated y hat. And from there once it takes all of these y hats, how does it consolidate them into a single y hat in layer 2?

Thank you

Hi @echerny
The idea of multiple neurons in DNN, is that each neuron will learn a different feature from the input. So, as more neurons we have, more features the model will learn. But this has a limit. Remember that depending on the network architecture, we need to add dropout to remove some neurons during calculation and avoid overfitting.