What if we have non-linear f(x), do we still need activation function?

In the video, “Why do we need activation function?”, the following set of equations are shown to explain why activations are needed:

However, we assume f(x) to be linear in these equations.
In Course 1 (Week 3, Decision Boundaries), we saw that we could use non-linear f(x) to generate non-linear decision boundaries.
How does this concept apply here? Do Neural Networks only use linear f(x)?
Also, what happens if f(x) is non-linear? Are activation functions still required in that case?


Neural networks use a non-linear activation function in the hidden layers.


In this explanation, Prof. Andrew is showcasing the need for Activation functions. If we do not use the Activation functions, even with all the layers of a neural network, it would still not be any better than a linear regression model.

So, if we need to be able to model more complex, non-linear decision boundaries, we need the activation functions

1 Like

Course 1 week 3 says we can engineer non-linear features for logistic regression. Certainly we can also engineer non-linear features for neural network.

The key about not using non-linear activation is that, it makes multiple layers as good as only 1 layer, and this problem remains regardless your features are linear or non-linear.

If you want to build a neural network with more than 1 layer, you need non-linear activation in between. If you do not use non-linear activations, it is meaningless (as explained by Andrew) to build more than 1 layer.

If you know how to engineer some perfect features on your own such that those engineered features have linear relationship with the label, you can do it, and use only one layer in your NN.


Thanks a lot, Raymond!
Appreciate all your clear explanations.

1 Like