Why do you need Non-Linear Activation Functions?

Hello,
Please, I can’t deeply understand the contents of this video in Week 3.
so a brief explanation of its contents would be so helpful.
Thanks.

Hello @MoHassan
Explanation by Andrew Ng Sir in the course video, (sir explained it very well, I think reading this might help you).

If you use a linear activation function or alternatively, if you don’t have an activation function, then no matter how many layers your neural network has, all it’s doing is just computing a linear activation function. So you might as well not have any hidden layers.

In some of the cases that are briefly mentioned, it turns out that if you have a linear activation function in hidden layers and a sigmoid function in the output layer, then the model is no more expressive than standard logistic regression without any hidden layer. A linear hidden layer is more or less useless because the composition of two linear functions is itself a linear function.

So unless you throw a non-linear function in there, then you’re not computing more interesting functions even as you go deeper in the network. There is just one place where you might use a linear activation function. g(x) = z. And that’s if you are doing machine learning on the regression problem. So if y is a real number. So for example, if you’re trying to predict housing prices. So y is not 0, 1, but is a real number, anywhere from $0 is the price of a house up to however expensive houses get. Maybe houses can be potentially millions of dollars, so however much houses cost in your data set. But if y takes on these real values, then it might be okay to have a linear activation function here so that your output y hat is also a real number going anywhere from minus infinity to plus infinity.

But then the hidden units should not use the activation functions. They could use ReLU or tanh or Leaky ReLU or maybe something else. So the one place you might use a linear activation function is usually in the output layer. But other than that, using a linear activation function in the hidden layer is extremely rare.

I Hope this explanation clears your doubts.
All the best

1 Like

Excuse me for interrupting, but isn’t the RELu function is a Linear function?

Its piecewise linear , so we call approximate it as non-linear itself