Week 3, "Why do you need Non-Linear Activation Functions?"

Juheon_Chu · March 21, 2024, 6:05am

Hello, around 2:47 of the lecture titled above in Week 3, Dr. Ng mentions that if we use linear activation functions on the first hidden layer and then use sigmoid for the second one, “the model is no more expressive than standard logistic regression without hidden layers.” Can I ask you why and how do you prove this? Thank you for your patience.

Best regards,
Juheon

gent.spah · March 21, 2024, 6:55am

The linear transformation produces a linear output, non-complex, just pass through the layer, it can not fit to complex behaviour of systems, thats why he says the model is the same as logistic regression! Its just a passthrough for the linear part!

Juheon_Chu · March 21, 2024, 9:20am

Thank you so much for your clear explanation!

paulinpaloalto · March 21, 2024, 5:14pm

It is an easy to prove mathematical theorem that the composition of linear functions is still linear. So unless you include non-linearity at each layer of the network, there is literally no point in having multiple layers: you can’t learn a more complex function. The point of adding layers with non-linearity is that it greatly increases the complexity of the functions that can be learned. So the networks are more powerful with more layers.

Topic		Replies	Views
Why do we need non linear activation function? Neural Networks and Deep Learning coursera-platform	4	1251	August 5, 2021
W2C2 Why do we need activation function? Advanced Learning Algorithms week-module-2	14	654	March 6, 2024
Why do you need Non-Linear Activation Functions? Neural Networks and Deep Learning coursera-platform	3	683	March 15, 2022
Activation Functions, (conceptually) Neural Networks and Deep Learning coursera-platform	10	606	November 2, 2022
What if we have non-linear f(x), do we still need activation function? Advanced Learning Algorithms week-module-2	4	517	August 2, 2022

Week 3, "Why do you need Non-Linear Activation Functions?"

Related topics