However, we assume f(x) to be linear in these equations.
In Course 1 (Week 3, Decision Boundaries), we saw that we could use non-linear f(x) to generate non-linear decision boundaries.
How does this concept apply here? Do Neural Networks only use linear f(x)?
Also, what happens if f(x) is non-linear? Are activation functions still required in that case?
In this explanation, Prof. Andrew is showcasing the need for Activation functions. If we do not use the Activation functions, even with all the layers of a neural network, it would still not be any better than a linear regression model.
So, if we need to be able to model more complex, non-linear decision boundaries, we need the activation functions
Course 1 week 3 says we can engineer non-linear features for logistic regression. Certainly we can also engineer non-linear features for neural network.
The key about not using non-linear activation is that, it makes multiple layers as good as only 1 layer, and this problem remains regardless your features are linear or non-linear.
If you want to build a neural network with more than 1 layer, you need non-linear activation in between. If you do not use non-linear activations, it is meaningless (as explained by Andrew) to build more than 1 layer.
If you know how to engineer some perfect features on your own such that those engineered features have linear relationship with the label, you can do it, and use only one layer in your NN.