In the video, Mr. Andrew mentioned that area can be used as a third feature for the linear model. But instead of adding it as a third feature, can’t we just **eliminate the first two features** (frontage and depth), and use **area as the only feature** for building the linear model?

By doing this, could the algorithm be computationally efficient?

Yes, you could. But it’s usually a very bad idea to throw away any data when you’re creating a model. You don’t really know in advance which ones may be very important.

Yeah!

In addition:

It might be worth to visualise your data, e.g. in the following way with histograms of features and scatterplots:

You can directly see whether some features have dependencies. E.g if there is some redundant information in the first three features, then a PCA might be useful to reduce your feature space dimension. Or alternatively (or additionally) you can manually select the most important features based on feature importance:

Hello,

Very good question. I would argue that it is better to let gradient descent decide which features are important rather than throwing them out ourselves. Remember, gradient descent finds the optimal coefficients of features which minimize the cost (thus making the best possible prediction). If the two original features are not needed for the prediction, and the best prediction is gotten just from the area, then gradient descent will run until the first two coefficients are close to zero. In general we should not rely on our intuition for picking coefficients, but let the optimization algorithm find the coefficients for us. We can however use intuition to add new features, but let the algorithm decide how important they are.

I hope that helps!

Alex

p.s. this is essentially what @TMosh was saying, I just wanted to expand on that a bit