Understanding how neural network work

i want to know how neural network actually work it seams to me thay it’s doing feature engineering from given featues but how it doing this and i still did’t understand why linear activation function is not good in hidden layer (i know in the lecture prof andrew said it wil be a linear model) but i did’t understand what does that mean and how this will affect the final output ?

Hello @Ibrahim_Mustafa

Let’s take a look at the slide of the lecture.

A key result here is that, if we use the linear activation function, after some maths step, the final output (in red) has the same form as just one layer (one W and one b), right? Because they have the same form, it means the 2-layer neural network effectively gets reduced to a 1-layer neural network.

This is bad, because if a 2-layer NN is just equivalent to a 1-layer NN, this means

  1. we are wasting the resources allocated for 2 layers but only get the performance of 1 layer, and
  2. we can’t exploit the benefit of a deep neural network because if 2 layers can be reduced to 1 layer, 3 layers can also be reduced to 1 layer, and simiarly, a deep network with many layers will still be reduced to a 1-layer model

Therefore, in order to exploit the benefit of a deep network, we don’t want to use the linear activation function.

Cheers,
Raymond

1 Like

ok i understanded this point but how the network make feature engineering to the inputs and determine weights for each neuron in home price prediction example i feel that hidden layers is just a black box function that is responsiple to learn features and get some good weights but i don’t understand how it do this

Hi @Ibrahim_Mustafa ,

I agree with you: This feels like magic.

In addition to the response given by @rmwkwok , I’d like to add some more from my own understanding.

Warning: I may be telling you here some things that you will be seeing in the next chapter, but I think I’ll describe it in simple terms. If there’s something not clear, please let me know.

Moving on…

As you advance in the DLS specialization, you will start to unveil this magic, to understand the inner gears that make neural networks work.

At this point I will give you a quick glimpse of what’s going on inside this magical box.

What are the ingredients?
The main ingredients of a neural network are matrices. Yes, that’s it. There are other components and as the models become more complex, there are even more components, but the big ingredient are matrices.

And what happens with these matrices? we do some linear algebra and some basic calculus. And by applying linear algebra and calculus we make the values in the matrices change, and this change is called ‘learning’.

In its most basic expression, a neural network is a set of matrices organized as layers that are updated with some linear operations followed by a non-linear operations.

We take an input in the form of a vector, and apply this simple linear equation:

Z = WX+b

W is a matrix.
X is a vector with the inputs (be it the model input, or be it the input for an inner layer)
b is a vector called ‘bias’

After doing this, we use a non-linear function like sigmoid. And we apply this to the ‘Z’ that we got previously:

A = sigmoid(Z)

And this A is the input for the next layer. And this cycle is repeated from the first layer until the last.

This is what’s called a ‘forward propagation’. It is ‘forward’ because we are like going, well, forward… from the beginning to the end.

Once we reach the end, we have to return to the beginning, and this ‘reverse route’ is called ‘backward propagation’. And here we have some more calculations using derivatives mainly. We calculate the loss and from this loss we start going back to modify the W matrices of each layer.

When we reach the beginning of the network again, that is, when the back propagation reaches the 1st hidden layer, we repeat the whole cycle again with a new Forward Propagation that will use the new values in the Weight matrices. And this cycle repeats again and again for a number of times called ‘epochs’.

After all ‘epochs’ are completed, we expect to have a neural network that has learned.

And yes, I agree with your recurrent thought: What? it learned?! Yes. It learned. And even though now we understand in detail how this works, guess what: IT STILL FEELS LIKE MAGIC!

And that’s why I have not been able to think in other things other than Machine Learning since I got bitten by this discipline. :slight_smile:

I hope this sheds some more light on your question. Please feel free to ask any additional question!

Thanks for reading me this far,

Juan

2 Likes

Hello @Ibrahim_Mustafa,

No problem. I left out this part because I wanted to make sure you are fine with the importance of non-linear activation which enables us to build deeper network.

Deeper network is the key to your feature engineering question.

Since @Juan_Olano has well described the behind-the-scene in detailed, I will continue my answer from there by expressing it in a graph, and further my explanation:

As Juan has pointed out about the forward propagation, the input X gets transformed as many times as the number of hidden layers we have, and particularly for my graph there are 2 transformations because there are 2 hidden layers.

What we should note about is the final outcome which is X^{transformed2} of that series of transformations. If we zoom into this network to just look at the output layer, we are essentially seeing a 1-layer network which accepts the X^{transformed2} as input and produce the prediction as output. This “zoomed view” is effectively suggesting us the following:

  1. the hidden layers “engineered” the inputs, through blackbox processes (as you described) or matrix operations (as Juan described), into some X^{transformed2}. X^{transformed2} is the engineered features.

  2. then we pass this X^{transformed2} to the output layer to perform a simple regression or a classification task.

The X^{transformed2} is made sure by gradient descent (which Juan described) to be useful but not layman understandable. This is why it looks like a blackbox but works.

Cheers,
Raymond

2 Likes

Hello, I am trying to figure out the shape of matrices but failing at some detail, could someone please explain this:
First this is vector W[1] or weight vector of layer 1


From the drawing picture I got that W[1] looks like this
W[1] = [ <-w[1]1->
<-w[1]2->
<-w[1]3->
<-w[1]4->]
So W has the shape (4, m) with m is number of input features, and 4 means 4 neurons of that layer. While W is a column vector, I don’t understand why is w[i]j a column vector, with each w inside it has horizontal shape e.g <-w[1]1->
image

I memorized this:
In vector representation, W^{[l]} shape should be (number of neurons of the current layer, number of neurons of the previous layer). And, for W^{[1]} (first hidden layer), W^{[1]} shape should be (number of neurons of the current layer, number of features).

1 Like

I still don’t understand how each of w of layer and neuron shape for example why w[4]3 is column vector when W looks like a column vector, shouldn’t each w in W look like <-w-> which makes it a row vector?

1 Like

In vectorized form, W is neither a column nor a row matrix. W is a matrix in which number of rows and number of columns can be any positive number. And w belongs to W. For example, w^{[2]}_1 is the value belongs to the 1st neuron of the weight of the 2nd layer, W^{[2]}.