W2C2 Why do we need activation function?

Hello, I dont understand the topic here.
When I use linear function in hidden layers and in output layer use the sigmoid function. In the lecture it is said that, the model cannot do logistic regression. Why?

Hi!

Are you referring to the Lecture Video: “Why do we need activation functions?”

Yes! I am referring this video. Thanks!

Hello @ugqzg, welcome to our community!

I think Andrew didn’t say the model cannot do logistic regression, instead I believe the idea was that a big neural network with linear activation for all hidden layers and sigmoid activation for the output layer is no different from just a logistic regression model, or a neural network with just one output layer with sigmoid activation and without any hidden layer.

We need non-linear activation between two hidden layers to let them be meaningfully separated as two layers. Two hidden layers with a linear activation in between is effectively just one hidden layer. The reason behind my statement was illustrated in the video with the maths equation at roughly the timestamp of 3:00.

Raymond

Hello, thanks for your answer!
Maybe the subtitles are wrong? In this video at roughly timestamp of 4:20-4:30, it is said that: " Or alternatively, if we were to still use a linear activation function for all the hidden layers, for these three hidden layers here, but we were to use a logistic activation function for the output layer, then it turns out you can show that this model becomes equivalent to logistic regression, and a4, in this case, can be expressed as 1 over 1 plus e to the negative wx plus b for some values of w and b. So this big neural network doesn’t do anything that you can’t also do with logistic regression."
That is what I dont understand. Thanks!

Hello @ugqzg,

The subtitle is correct. I am quoting from it:

it turns out you can show that this model becomes equivalent to logistic regression

this big neural network doesn’t do anything that you can’t also do with logistic regression.

These two lines talk about the same thing that without non-linear activation in the hidden layers, a big neural network is nothing more than just a simple logistic regression. Note the double negative in the second quote “… doesn’t do anything that you can’t also do …”

Raymond

That means, that I can also do that? Thanks!

it means a neural network with linear activation for all hidden layers and sigmoid activation for the output layer is no different from just a logistic regression model.

OK, Thank you very much!

Hi Raymond,

What you were saying here is that it defeats the purpose of using a neural network. Rather than using a neural network with multi hidden layers (using linear activation) and one output layer, we might as well have just used a logistic regression model, correct?

Thank you
Christina

1 Like

Exactly!

Raymond

Hi Raymond,

Further to my previous question, in the following video “Improved implementation of softmax” shown in below screenshot, even though the last layer now using linear activation (rather softmax), because hidden layers are using relu, the model still is still a softmax model(multi classification model) rather than becomes a linear regression model, am I taking this correctly?

Or, the reason this is not a linear regression because the predict (last line of the code) is still using softmax(logits)?

Thank you
Christina

Hello @Christina_Fan

The difference that ReLU makes is turning the model into non-linear. It does not matter to whether it is a multi-classification model or not.

It is a multi-classification when softmax is used (either by specifying ‘softmax’ in the output layer, OR not specifying it in the output but enabling it in the loss function).

So, some modifications are needed to the following:

Cheers,
Raymond

Brilliant, thank you for the clarification Raymond, make much more sense now.

1 Like

You are welcome :wink:

Cheers.