How can a neural network be used for polynomial regression?
Let’s consider a NN with 4 layers and one unit within each layer. The computation looks something like this:

x->g(wx+b)->g[w(g(wx+b))]->…

If my data is of a quadratic form, how can this NN learn to fit a quadratic curve since it never computes x^2?

@amitsubhashchejara I am going to let one of the Mentors whom is better with maths give you an improved explanation of this-- But, in my mind it is better just to consider that your data assumes a polynomial form.

So, if in your case your data is created with a quadratic equation, then that is the form the data takes. However, what the NN actually produces will not be that exact same equation-- instead it finds the series of weights which, in conjunction and combination with those of other nodes, ‘fits it’.

And, I know you are offering a theoretical (i.e. one node per layer-- basically linear, thus asking how do I get non-linear behavior out of something linear)… Well… Not to avoid a full answer, but you’d never have only one node per layer.

Thank you for the clarification, but let’s say that we have a NN with all the activations set to ReLU, which is linear. Now the output will be a function of x no matter the number of nodes and layers. So the line that fits the data will change its slope but never bend to become a quadratic curve. Please help with this!!

You’re right that ReLU is a piecewise linear activation function, which means that the network’s output remains a piecewise linear function of the input. This can make it difficult for the network to directly fit a quadratic curve, especially if ReLU is the only activation function used.

Even with ReLU, the network can approximate a quadratic function by creating a series of linear segments that piece together to resemble a curve. However, this requires a sufficiently deep network or many units per layer to capture the necessary breakpoints where the slope changes. The more layers and units, the better the approximation, but it’s still fundamentally a piecewise linear function.

Consider using other activation functions, such as sigmoid or tanh, which are inherently nonlinear and can bend to fit curves, to more effectively model a quadratic curve. Another approach might be to mix ReLU with other activations in the network to combine the strengths of both linear and nonlinear modeling. This combination would help the network work better with quadratic and other nonlinear types of relationships.

There is a forum thread where I detailed how this process works (e.g. modeling the equation of a parabola), if you’re interested I can try to find a link.

But I recommend you work up this example yourself. It’s very educational.