So I’m looking at the week 1 videos for course #2, and I can’t figure out what the purpose of creating “neurons” is and “layers”. Why do this? Couldn’t you just use logistical regression like in the first course with input vector X and weights vector W with bias B and keep guessing the weights, put it through the sigmoid and compare the result with the actual y-hat value, then continue to reduce the cost function, all like in the first course? Why is having a neural network better with all these layers that feed into each other?
A side question…does each neuron have all the input values (vector X) going into them but only one set of vector W values, and each neuron of the first layer have a unique vector W values? So if I wanted 25 different weight vectors in a layer, I’d have 25 neurons? For example, in one of the first lectures we have 4 inputs called “price, marketing, shipping cost and perceived value” as input X. Does each of those inputs go into each neuron or does only certain input values X go into each neuron?
Logistic regression only allows for the model to use a linear combination of all of the input features. For more complex models, we need to also include non-linear combinations of the features.
This happens automatically in the hidden layer of a neural network. Since it always includes some sort of non-linear function (such as ReLU or sigmoid), it automatically considers all of the possible non-linear combinations of the inputs.
The knowledge learned by the hidden layer is then combined via the units in the output layer to give a much more complex model than logistic regression could achieve.
Thanks. What do you mean by “non-linear combinations of the features”? From the videos, it seems like all the neurons take in all the inputs from the original vector X, use some weights for each neuron, then uses weights on those outputs and so on and so forth. I still don’t quite understand how these intermediate hidden layers create value.
Each hidden layer unit is connected to each input feature, using an individual weight. The input features are multiplied by the weights, and the non-linear activation function for the hidden layer is applied.
This causes each hidden layer unit to present a different non-linear combination of all of the input features.
Think of a neural network as a function approximator. The combination of neurons, each neuron being a logistic regression unit, helps to create a piecewise linear approximation of the target function.
If there are 3 inputs X, and 4 neurons, does each input X go into each neuron and each neuron pick the weights that chooses which inputs for that particular neuron matter and which don’t?
Or, does one (or a combination of less than 3) go into each neuron and each neuron has a different vector of X going into it (like 1 or 2 features instead of all 3)?
The value of each hidden and output unit is some non-linear function applied to the sum of the products of its weights and the unit values from in the previous layer plus the bias value. That equation for the hidden layer is g(w*x + b). The ‘x’ values are the input features.
For the output layer, the equation is similar.
The W1 weights are formed into a matrix, for easy calculation. The W2 weights are a vector, because there is only one output unit in this example.
The input layer values are simply the input features ‘x’.
In addition to what all Mentors said. In the Neural Network(NN) the first layers it does what the logistic regression can do like learn the edges or small about each feature and make small combination with all features… But In the deep layers it make more complex combination and more complex computations so it learn more about features like the weight of how much that feature it can affect on output and many things…so the NN or Deep NN it very useful and very powerful than normal logistic regression it also can do many many thing other logistic regression …I encourage you to after you end with this specialization you start learn Deep Learning specialization you will learn more and more about Deep Learning
please feel free to ask any questions,
Thanks,
Abdelrahman
So this is a 1 hidden layer neural network, whose hidden layer has 3 neurons. You can now create a closed region to roughly separate the classes, but still not good.
Now this is a 1 hidden layer neural network, whose hidden layer has n neurons:
With n logistic regressions combined, you have n lines available to separate the classes. If n is big, the lines closing the region start looking like a curve (hope you have watched flatland). Much better.
Now we have an additional layer, with 2 neurons. This means you can create two closed regions defined as explained above. This allows to separate even more complex datasets:
Of course, there are some extra details that I did not mention here, but this is the basic intuition. You can play with neural networks here and see it for yourself: https://playground.tensorflow.org/