Why layers? What is the purpose?

So I’m looking at the week 1 videos for course #2, and I can’t figure out what the purpose of creating “neurons” is and “layers”. Why do this? Couldn’t you just use logistical regression like in the first course with input vector X and weights vector W with bias B and keep guessing the weights, put it through the sigmoid and compare the result with the actual y-hat value, then continue to reduce the cost function, all like in the first course? Why is having a neural network better with all these layers that feed into each other?

A side question…does each neuron have all the input values (vector X) going into them but only one set of vector W values, and each neuron of the first layer have a unique vector W values? So if I wanted 25 different weight vectors in a layer, I’d have 25 neurons? For example, in one of the first lectures we have 4 inputs called “price, marketing, shipping cost and perceived value” as input X. Does each of those inputs go into each neuron or does only certain input values X go into each neuron?

Welcome to the community!

That’s a good question.

Logistic regression only allows for the model to use a linear combination of all of the input features. For more complex models, we need to also include non-linear combinations of the features.

This happens automatically in the hidden layer of a neural network. Since it always includes some sort of non-linear function (such as ReLU or sigmoid), it automatically considers all of the possible non-linear combinations of the inputs.

The knowledge learned by the hidden layer is then combined via the units in the output layer to give a much more complex model than logistic regression could achieve.

4 Likes

Thanks. What do you mean by “non-linear combinations of the features”? From the videos, it seems like all the neurons take in all the inputs from the original vector X, use some weights for each neuron, then uses weights on those outputs and so on and so forth. I still don’t quite understand how these intermediate hidden layers create value.

Each hidden layer unit is connected to each input feature, using an individual weight. The input features are multiplied by the weights, and the non-linear activation function for the hidden layer is applied.

This causes each hidden layer unit to present a different non-linear combination of all of the input features.

Think of a neural network as a function approximator. The combination of neurons, each neuron being a logistic regression unit, helps to create a piecewise linear approximation of the target function.

1 Like

If there are 3 inputs X, and 4 neurons, does each input X go into each neuron and each neuron pick the weights that chooses which inputs for that particular neuron matter and which don’t?

Or, does one (or a combination of less than 3) go into each neuron and each neuron has a different vector of X going into it (like 1 or 2 features instead of all 3)?

If there are 3 input features and four hidden layer neurons, you connect all of the combinations, and you get 12 connections. Those are the weights.

I’ll try to draw a sketch and post it here.

The weights are organized as a size (4 x 3) matrix. This allows the hidden layer activations to be computed using a matrix product.

Here is a sketch. It ignores the bias values.

image

The value of each hidden and output unit is some non-linear function applied to the sum of the products of its weights and the unit values from in the previous layer plus the bias value. That equation for the hidden layer is g(w*x + b). The ‘x’ values are the input features.

For the output layer, the equation is similar.

The W1 weights are formed into a matrix, for easy calculation. The W2 weights are a vector, because there is only one output unit in this example.

The input layer values are simply the input features ‘x’.

2 Likes

Cool, thanks! I’ll re-activate this thread if I have more questions.

2 Likes

HI @Arya_Afrashteh

In addition to what all Mentors said. In the Neural Network(NN) the first layers it does what the logistic regression can do like learn the edges or small about each feature and make small combination with all features… But In the deep layers it make more complex combination and more complex computations so it learn more about features like the weight of how much that feature it can affect on output and many things…so the NN or Deep NN it very useful and very powerful than normal logistic regression it also can do many many thing other logistic regression …I encourage you to after you end with this specialization you start learn Deep Learning specialization you will learn more and more about Deep Learning

please feel free to ask any questions,
Thanks,
Abdelrahman

This is what you do with logistic regression: a single line available to separathe the classes:

Definitely not good enough.

This is what you do with 2 combined logistic regressions: two lines available to separate the classes.

So this is a 1 hidden layer neural network, whose hidden layer has 2 neurons. Still not good.

This is what you do with 3 combined logistic regressions: 3 lines available to separate the classes:

So this is a 1 hidden layer neural network, whose hidden layer has 3 neurons. You can now create a closed region to roughly separate the classes, but still not good.

Now this is a 1 hidden layer neural network, whose hidden layer has n neurons:

With n logistic regressions combined, you have n lines available to separate the classes. If n is big, the lines closing the region start looking like a curve (hope you have watched flatland). Much better.

Now we have an additional layer, with 2 neurons. This means you can create two closed regions defined as explained above. This allows to separate even more complex datasets:

Of course, there are some extra details that I did not mention here, but this is the basic intuition. You can play with neural networks here and see it for yourself: https://playground.tensorflow.org/

4 Likes

Great illustrations! Thank you @Andre89!

Raymond

1 Like