W2C2 Why do we need activation function?

ugqzg · September 15, 2022, 2:48pm

Hello, I dont understand the topic here.
When I use linear function in hidden layers and in output layer use the sigmoid function. In the lecture it is said that, the model cannot do logistic regression. Why?

SamReiswig · September 15, 2022, 4:37pm

Hi!

Are you referring to the Lecture Video: “Why do we need activation functions?”

ugqzg · September 15, 2022, 4:58pm

Yes! I am referring this video. Thanks!

rmwkwok · September 15, 2022, 9:55pm

Hello @ugqzg, welcome to our community!

I think Andrew didn’t say the model cannot do logistic regression, instead I believe the idea was that a big neural network with linear activation for all hidden layers and sigmoid activation for the output layer is no different from just a logistic regression model, or a neural network with just one output layer with sigmoid activation and without any hidden layer.

We need non-linear activation between two hidden layers to let them be meaningfully separated as two layers. Two hidden layers with a linear activation in between is effectively just one hidden layer. The reason behind my statement was illustrated in the video with the maths equation at roughly the timestamp of 3:00.

Raymond

ugqzg · September 16, 2022, 1:17am

Hello, thanks for your answer!
Maybe the subtitles are wrong? In this video at roughly timestamp of 4:20-4:30, it is said that: " Or alternatively, if we were to still use a linear activation function for all the hidden layers, for these three hidden layers here, but we were to use a logistic activation function for the output layer, then it turns out you can show that this model becomes equivalent to logistic regression, and a4, in this case, can be expressed as 1 over 1 plus e to the negative wx plus b for some values of w and b. So this big neural network doesn’t do anything that you can’t also do with logistic regression."
That is what I dont understand. Thanks!

rmwkwok · September 16, 2022, 2:14am

Hello @ugqzg,

The subtitle is correct. I am quoting from it:

it turns out you can show that this model becomes equivalent to logistic regression

this big neural network doesn’t do anything that you can’t also do with logistic regression.

These two lines talk about the same thing that without non-linear activation in the hidden layers, a big neural network is nothing more than just a simple logistic regression. Note the double negative in the second quote “… doesn’t do anything that you can’t also do …”

Raymond

ugqzg · September 23, 2022, 7:08am

That means, that I can also do that? Thanks!

rmwkwok · September 23, 2022, 7:26am

it means a neural network with linear activation for all hidden layers and sigmoid activation for the output layer is no different from just a logistic regression model.

ugqzg · September 23, 2022, 8:41am

OK, Thank you very much!

Christina_Fan · March 2, 2024, 5:00am

Hi Raymond,

What you were saying here is that it defeats the purpose of using a neural network. Rather than using a neural network with multi hidden layers (using linear activation) and one output layer, we might as well have just used a logistic regression model, correct?

Thank you
Christina

rmwkwok · March 2, 2024, 8:00am

Exactly!

Raymond

Christina_Fan · March 6, 2024, 1:55am

Hi Raymond,

Further to my previous question, in the following video “Improved implementation of softmax” shown in below screenshot, even though the last layer now using linear activation (rather softmax), because hidden layers are using relu, the model still is still a softmax model(multi classification model) rather than becomes a linear regression model, am I taking this correctly?

Or, the reason this is not a linear regression because the predict (last line of the code) is still using softmax(logits)?

Thank you
Christina

rmwkwok · March 6, 2024, 2:19am

Hello @Christina_Fan

The difference that ReLU makes is turning the model into non-linear. It does not matter to whether it is a multi-classification model or not.

It is a multi-classification when softmax is used (either by specifying ‘softmax’ in the output layer, OR not specifying it in the output but enabling it in the loss function).

So, some modifications are needed to the following:

Cheers,
Raymond

Christina_Fan · March 6, 2024, 2:43am

Brilliant, thank you for the clarification Raymond, make much more sense now.

rmwkwok · March 6, 2024, 3:14am

You are welcome

Cheers.

Topic		Replies	Views
Week 2 > why do we need activation layer > video at 4.29 time duration Advanced Learning Algorithms week-module-1	1	546	September 17, 2022
Week 3, "Why do you need Non-Linear Activation Functions?" Neural Networks and Deep Learning week-module-3 , coursera-platform	3	221	March 21, 2024
Don't use Linear activation in hidden layers Advanced Learning Algorithms week-module-2	2	500	May 11, 2023
Why do you need Non-Linear Activation Functions? Neural Networks and Deep Learning coursera-platform	3	683	March 15, 2022
Neural Network functions Advanced Learning Algorithms week-module-2	3	476	April 22, 2023

W2C2 Why do we need activation function?

Related topics