Feature engineering - Week 2: Regression with multiple input variables |

Praveen_Chandrasekar · July 26, 2022, 6:34pm

I have a doubt in the feature engineering topic. Professor mentioned talked about an example in the lecture video where he performs feature engineering to create a new feature (area) and increases the number of columns, in other words, he has increased the dimensions that go into the regression model. After seeing his approach, I thought of simplifying the model by reducing the dimensions by eliminating the length and breadth of the house land and just use the new feature, area and make the model as a simple linear regression instead of making it multiple linear regression. Does my idea make sense?

TMosh · July 26, 2022, 7:19pm

Adding new features can be very helpful.

Removing features is not helpful, because you do not know whether they are important.

Praveen_Chandrasekar · July 26, 2022, 7:21pm

Adding features my result in high dimensions right? And this can become an issue and may lead to overfitting right?

rmwkwok · July 26, 2022, 11:19pm

Hello @Praveen_Chandrasekar,

I think your idea is absolutely worthy to be tested. The topic of overfitting and the technique of discovering whether our model is overfitting the training data is in course 2, but we can talk about a tiny part of it which is relevant to our discussion.

The idea is we want the model to do well in a set of non-training data called “cv data”, so in order to test whether the features we want to remove are important for the model to do well on “cv data”, we train two models: (A) with those features removed and (B) keeps those features, then we evaluate both models on our cv dataset and see which scores higher.

If the model (A) scores better, this validates your assumption that removing them is better than keeping them. Making assumptions is important, and validating your assumptions is as important. You will find out more in course 2.

Raymond

Praveen_Chandrasekar · July 26, 2022, 11:32pm

Thanks for the detailed explanation!

rmwkwok · July 27, 2022, 12:16am

You are welcome @Praveen_Chandrasekar!

Topic		Replies	Views
Feature engineering for multiple features Supervised ML: Regression and Classification week-2	15	173	August 24, 2024
Feature Engineering Supervised ML: Regression and Classification week-2	1	540	June 30, 2022
Clarification on Feature Engineering and Multicollinearity Concern Supervised ML: Regression and Classification week-2	1	21	December 30, 2024
C1 W2: Graph used in lab of feature engineering and polynomial regression Supervised ML: Regression and Classification week-2	2	347	September 3, 2023
Polynomial regression and Feature Engineeringin Supervised ML: Regression and Classification week-1	2	477	November 24, 2022

Feature engineering - Week 2: Regression with multiple input variables |

Related topics