I was wondering what if the y_hat is not a linear combination of all the input features, say x has got four features x1,x2,x3,x4, just for the sake of argument, I’ve already known that y = 1/x1 + x2**2 + c3x3 + c4x4, can the neural network still function well if it’s of both low bias and low variance in the end of training done?
I mean, can the neural network, (say with two hidden layers of 4, 5 for example) be able to find a somewhat approximating formula for the original theoretical one somehow? In a sense that even though the y is not a linear combination of x1,x2,x3,x4, but it still can give us some formula for predicting y_hat? Please give some insightful opinions on that, many thanks!