Clarification on Feature Engineering and Multicollinearity Concern

Hello everyone,

I’m currently taking the Machine Learning course and came across the feature engineering example in the lesson about frontage, depth, and house price prediction. I understand the concept of creating new features to improve model performance, and I found it insightful how frontage and depth were combined into a new feature, “area.”

However, I have a concern regarding multicollinearity. Since the area is derived from multiplying frontage and depth, it seems that adding this feature might lead to collinearity between area, frontage, and depth. As a result, I’m wondering how this interaction between the features might affect the model’s stability, particularly in linear models, where multicollinearity can inflate the variance of coefficient estimates.

Could anyone provide clarification on how this concern might be addressed in practice or explain why adding the “area” feature doesn’t significantly impact the model? I’d really appreciate any insights.

Thanks in advance for your help!

Interaction terms are usually correlated with the underlying features, but aren’t fully a linear functioning of the features (unless one feature is a constant, which breaks the model if you have an intercept). Variance inflation is not going to be infinite, but its a good practice to check VIF if you are concerned about model stability. Other alternative is to use regularization to perform smoothing (L2) or even variable selection (L1).