Doubt in Feature Engineering - House Example

cs_Chinmay · October 11, 2022, 1:47pm

In the video, Mr. Andrew mentioned that area can be used as a third feature for the linear model. But instead of adding it as a third feature, can’t we just eliminate the first two features (frontage and depth), and use area as the only feature for building the linear model?
By doing this, could the algorithm be computationally efficient?

TMosh · October 11, 2022, 2:49pm

Yes, you could. But it’s usually a very bad idea to throw away any data when you’re creating a model. You don’t really know in advance which ones may be very important.

Christian_Simonis · October 11, 2022, 3:21pm

Yeah!

In addition:
It might be worth to visualise your data, e.g. in the following way with histograms of features and scatterplots:

pandas.plotting.scatter_matrix — pandas 1.4.2 documentation

You can directly see whether some features have dependencies. E.g if there is some redundant information in the first three features, then a PCA might be useful to reduce your feature space dimension. Or alternatively (or additionally) you can manually select the most important features based on feature importance:

aachandler · October 18, 2022, 7:42pm

Hello,

Very good question. I would argue that it is better to let gradient descent decide which features are important rather than throwing them out ourselves. Remember, gradient descent finds the optimal coefficients of features which minimize the cost (thus making the best possible prediction). If the two original features are not needed for the prediction, and the best prediction is gotten just from the area, then gradient descent will run until the first two coefficients are close to zero. In general we should not rely on our intuition for picking coefficients, but let the optimization algorithm find the coefficients for us. We can however use intuition to add new features, but let the algorithm decide how important they are.

I hope that helps!
Alex

p.s. this is essentially what @TMosh was saying, I just wanted to expand on that a bit

Topic		Replies	Views
C1_W2_Feature Engineering Supervised ML: Regression and Classification week-2	7	397	September 9, 2023
Feature engineering - Week 2: Regression with multiple input variables \| Supervised ML: Regression and Classification week-1	5	524	July 27, 2022
Quick question Advanced Learning Algorithms week-3	3	19	October 25, 2024
Clarification on Feature Engineering and Multicollinearity Concern Supervised ML: Regression and Classification week-2	1	21	December 30, 2024
Feature engineering for multiple features Supervised ML: Regression and Classification week-2	15	173	August 24, 2024

Doubt in Feature Engineering - House Example

Related topics