C1_W2_Feature Engineering

:point_up_2:t2: In this example why are we using all three features
instead of only using area (x3) feature?
Is that not redundant? :thinking:

It isn’t redundant. These are three unique features - the two original features, plus an additional engineered feature.

x3 is the combination of x1 and x2 as explained by Prof. Andrew.

So he is basically explaining the reason behind adding this extra feature x3 is to know if area which is a combination x1(frontage) and x2(depth) would give a more predictive housing price range than compare to the individual parameters here as frontage x1 and x2 as depth.

In general, with the individual features x1 and x2, they have added x3 as the area of the land is more predictive of the housing price than compare to the individual features and that’s is why all 3 features were used. One need to understand the feature here are frontage and depth, to which they added one more feature area to get better predictive analysis of housing price.

They are not redundant because w_1x_1 \ne w_3x_3, w_2x_2 \ne w_3x_3 and w_1x_1 + w_2x_2 \ne w_3x_3 no matter how we tune the weights. I think it is a good example which shows that even though we have engineered a new feature that we highly believe to be helpful, it does not mean we have to give up the original ones that build the new one. Instead, we were letting the gradient descent to decide it - if indeed x_1 became useless in the presence of x_3, then w_1 is going to be very close to zero, when compared with w_2 and w_3 given that all features are normalized, or, more practically, the model without x_1 and x-2 will perform better than the model with them on a so-called “dev” dataset.


1 Like

Thank you for replying.

Originally I thought:
If area is being included then length & breadth information is contained in it.
So including them would be redundant.

Just to add on to your explanation:
2 houses with same area may have different prices since (may be) the house with large width but small height or vice-versa won’t be as lucrative as one with same area but more balanced ratio.

Follow-up query:
Will it be a better model if instead of length, breadth, area,
we keep area & ratio of legth & width?
Since that would give one less parameter to worry about and hence simpler model?

Hello @Debatreyo_Roy

That is a better way to explain it! Thank you for sharing!

An interesting one! We probably rather prefer 1:1 than 1:9, however, it is only useful if 1:9 is not uncommon. Speaking of whether it is common or not, it goes back to data exploration - if all of the houses have similar ratio, then the ratio feature might not be very useful. Perhaps we can only say it is an interesting feature, but whether it is going to be useful can only be told by our exploratory step and/or performance comparison of models with different features combinations.

We might want to think that, mathematically, we can recover width and length from area and ratio, so we can drop width and length. Yes, human can do it, but linear regression can’t.

Just like area can’t replace length and width because w_1x_1 \ne w_3x_3, w_2x_2 \ne w_3x_3 and w_1x_1 + w_2x_2 \ne w_3x_3. The same rationale applies. In a linear regression model, we can only add area and ratio up, but not multiply them, so there is no way to recover width or length.

Above is from the point of view of maths. Since you have shared a good example, let me also try it. If people in country A believe a house with one side longer than 10 meters will bring luck to a family and thus should be more expensive, then the factors of length and width should be considered separately for the culture of how to bring luck. Area should be considered for a good living quality. Ratio should be considered for how well we can use the space. Here, we have one reason for considering each of the width, length, area, and ratio, and so we put them all in!


1 Like

Wow that’s a nice example on how one should be aware of non-technical (such as local culture) aspects too while building a model.
Also, I completed the 1st course in ML specialization so now I understand that
by applying Regularization model can still be kept simple even when using many (including polynomicals) features.

That sounds an interesting idea! I hope you will have a chance to verify it by really building some models and to test the limit of the statement. For example, how many? How simple?