Why do we need non linear activation function?

Anbu · May 18, 2021, 2:19pm

Hi Sir,

Can someone please say what are the highlight points that saying why do we need non-linear activation function in this lecture vide (why do we need non linear activation function ? )

We watched the video but unable to figure out or get good feeling that because of these points we can must need non linear activation function

crisrise · May 18, 2021, 5:33pm

Hi @Anbu in my opinion the core part is at minute 2.23 when Andrew says that without non-linear activation function there is no need of deep network because a combination of linear functions can be reducted to a single linear function. So the power of deep net comes from the combination of linear and non linear operator.

josemrivera · August 4, 2021, 10:32am

Yeah Andrew says that with non linear functions for the hidden layers we get more interesting functions as outputs. But I think we miss the reason for why non-linear or “more interesting” functions are good. Is it because in high dimensional spaces these nonlinearities favour separability? Could someone expand or provide references?

Thank you

paulinpaloalto · August 4, 2021, 5:59pm

There are two levels (at least) to the answer here. The most important point is what @crisrise said in his earlier response on this thread:

The composition of linear functions is still linear. What that means is that if you don’t include non-linearity at every layer of a neural network, then there is literally no point in having multiple layers: they all collapse into one layer. Without the non-linear activation functions in every hidden layer, every neural network would be functionally equivalent to Logistic Regression, which can only do linear separation and is not nearly as powerful at classification as deep neural networks.

Once you have the required non-linearity, then you can add as many layers with as many neurons as required to learn a function that is complex enough to provide a mapping (function) that is close enough to the complexity of your actual data in order to give accurate predictions. Why would you not want the ability to have a “more interesting” function as opposed to a “less interesting” function?

josemrivera · August 5, 2021, 12:43pm

Thanks a lot @paulinpaloalto for your very intuitive answer, it’s all clear now. It makes sense: introducing nonlinearities at every layer makes these models highly non-linear in the feature space (which is at this point in the course the input data space). These highly non-linear functions can learn very complex data, even to the point of overfitting.

Topic		Replies	Views
Activation Functions, (conceptually) Neural Networks and Deep Learning coursera-platform	10	614	November 2, 2022
Why do you need Non-Linear Activation Functions? Neural Networks and Deep Learning coursera-platform	3	683	March 15, 2022
Week 3, "Why do you need Non-Linear Activation Functions?" Neural Networks and Deep Learning week-module-3 , coursera-platform	3	223	March 21, 2024
What if we have non-linear f(x), do we still need activation function? Advanced Learning Algorithms week-module-2	4	517	August 2, 2022
What does activation actually means? Advanced Learning Algorithms week-module-1	5	529	January 20, 2023

Why do we need non linear activation function?

Related topics