Neural network with polynomial features

In C2W3_Lab_01_Model_Evaluation_and_Selection there is the text “From earlier lectures in this course, you may have known that neural networks can learn non-linear relationships so you can opt to skip adding polynomial features.”

I think the only example in this course of neural networks learning non-linear relationships is using the RELU function to model piecewise linear relationships like this:


Can neural networks also model “curved” relationships? If so, how can they do that without using polynomial features?

Yes, any non-linear function in the hidden layer will automatically generate new non-linear combinations of the existing features.

sigmoid() and tanh() are commonly used, in addition to ReLU.

This is maybe outside the scope of this class, but how can sigmoid() and tanh() effectively model curves just as well if not better than poplynomial functions?

A single hidden layer unit won’t do much good by itself.

But If you have multiple non-linear units in the hidden layer, the non-linear features they create are combined in the output layer. The weighted summation is what gives the final output.

For a classification problem, can a neural network with only RELU activation functions for the hidden layers be used to model a “curvy” decision boundary? It seems non-intuitive that it would be able to do so because the RELU function doesn’t itself have curves in it.

It can model one, but not perfectly. It is depends on the number of ReLU units, and how accurate the model needs to be.

Any non-linear function in the hidden layer can be useful. ReLU’s primary benefit is that its gradients are very easy to compute. Its drawback is that the gradients are zero for all negative inputs. So you need to use a lot more ReLU units than if you were using a continuous non-linear function.