Feature engineering - Week 2: Regression with multiple input variables |

I have a doubt in the feature engineering topic. Professor mentioned talked about an example in the lecture video where he performs feature engineering to create a new feature (area) and increases the number of columns, in other words, he has increased the dimensions that go into the regression model. After seeing his approach, I thought of simplifying the model by reducing the dimensions by eliminating the length and breadth of the house land and just use the new feature, area and make the model as a simple linear regression instead of making it multiple linear regression. Does my idea make sense?

Adding new features can be very helpful.

Removing features is not helpful, because you do not know whether they are important.

Adding features my result in high dimensions right? And this can become an issue and may lead to overfitting right?

Hello @Praveen_Chandrasekar,

I think your idea is absolutely worthy to be tested. The topic of overfitting and the technique of discovering whether our model is overfitting the training data is in course 2, but we can talk about a tiny part of it which is relevant to our discussion.

The idea is we want the model to do well in a set of non-training data called “cv data”, so in order to test whether the features we want to remove are important for the model to do well on “cv data”, we train two models: (A) with those features removed and (B) keeps those features, then we evaluate both models on our cv dataset and see which scores higher.

If the model (A) scores better, this validates your assumption that removing them is better than keeping them. Making assumptions is important, and validating your assumptions is as important. :slight_smile: You will find out more in course 2.


Thanks for the detailed explanation! :slight_smile:

You are welcome @Praveen_Chandrasekar!