Why do you need Non-Linear Activation Functions?

MoHassan · June 8, 2021, 11:55am

Hello,
Please, I can’t deeply understand the contents of this video in Week 3.
so a brief explanation of its contents would be so helpful.
Thanks.

rajat · June 9, 2021, 8:35am

Hello @MoHassan
Explanation by Andrew Ng Sir in the course video, (sir explained it very well, I think reading this might help you).

If you use a linear activation function or alternatively, if you don’t have an activation function, then no matter how many layers your neural network has, all it’s doing is just computing a linear activation function. So you might as well not have any hidden layers.

In some of the cases that are briefly mentioned, it turns out that if you have a linear activation function in hidden layers and a sigmoid function in the output layer, then the model is no more expressive than standard logistic regression without any hidden layer. A linear hidden layer is more or less useless because the composition of two linear functions is itself a linear function.

So unless you throw a non-linear function in there, then you’re not computing more interesting functions even as you go deeper in the network. There is just one place where you might use a linear activation function. g(x) = z. And that’s if you are doing machine learning on the regression problem. So if y is a real number. So for example, if you’re trying to predict housing prices. So y is not 0, 1, but is a real number, anywhere from $0 is the price of a house up to however expensive houses get. Maybe houses can be potentially millions of dollars, so however much houses cost in your data set. But if y takes on these real values, then it might be okay to have a linear activation function here so that your output y hat is also a real number going anywhere from minus infinity to plus infinity.

But then the hidden units should not use the activation functions. They could use ReLU or tanh or Leaky ReLU or maybe something else. So the one place you might use a linear activation function is usually in the output layer. But other than that, using a linear activation function in the hidden layer is extremely rare.

I Hope this explanation clears your doubts.
All the best

abdelmalek · September 5, 2021, 11:47pm

Excuse me for interrupting, but isn’t the RELu function is a Linear function?

Subin_Nair · March 15, 2022, 10:38am

Its piecewise linear , so we call approximate it as non-linear itself

Topic		Replies	Views
Linear Activation Function Hidden Layer Neural Networks and Deep Learning	3	573	May 25, 2021
Why do we need non linear activation function? Neural Networks and Deep Learning	4	1207	August 5, 2021
Week 3, "Why do you need Non-Linear Activation Functions?" Neural Networks and Deep Learning week-3	3	218	March 21, 2024
Neural Network functions Advanced Learning Algorithms week-2	3	476	April 22, 2023
Activation Functions, (conceptually) Neural Networks and Deep Learning	10	602	November 2, 2022

Why do you need Non-Linear Activation Functions?

Related topics