In the house pricing feature engineering, professor talks about new features such as the area of the plot X1X2, where two features are multiplied to give a new feature say X3=X1X2

But in the neural network, if we give input only X1 and X2 and apply ReLU to each layer, is there anyway the neurons can provide output based on X3(or any other feature different from what we have given as input), without us explicitly mentioning that?
Or is it always going to be a function of X1 and X2 and there modifications as the complexity of subsequent layers increases?

It’s difficult for NN to give you X1 * X2 in general.

However, it’s possible for NN (with ReLU, for example) to approximate X1 * X2 within a certain range of X1 and a certain range of X2.

The difference between the two is, your X1 * X2 is exact, but NN’s is just an approximation. You may go to C2 W2 Optional Lab: ReLU just to see what I mean by approximation using ReLU.

I want to show you another demo, that you can also try it yourself here.

(You may need to examine my screenshot or check out the website yourself to see what I am talking in below)

If we supply the feature x1x2 to the model like below, the model is able to classify the samples perfectly even with just one layer and one neuron.

However, if we don’t give x1x2 to the model, we need at least 2 layers and 5 neurons in total to approximate x1x2 with separately x1 and x2, and the approximation doesn’t look perfect as well - because it has no intention to really approximate x1x2, its intention, as driven by minimizing our cost function, is just to find boundaries to correctly classify the samples.