Do we scale the features assuming all the features have the same amount of influence on the result?

In this screen, Andrew Ng says that since the range of x1 (size in sqr feet) is bigger than the range of x2 (no. of bedrooms), w1 would be smaller than w2. This makes variables x1 and x2 contribute the same amount to the final price.

But what if, in real life, one variable contributed more to a house’s price than the other?

For example, in a society that values comfort more than privacy, the number of bedrooms would add much less value to the selling price, compared to the area of the house. In this case, x1’s weight should have a much bigger value than shown in the tutorial. If the need for more area is high enough, it could even come equal to w2.

So why do we do feature scaling? Why is this an almost-standard pre-processing procedure?

1 Like

Hello @Govarthenan_Rajadura,

We will let the the features’ associated weights to decide what contribute more. Since it is the training algorithm and the training dataset that decide the optimal weights, we are actually letting the algorithm and the data to decide what contribute more.

Because we want to train the model more efficiently. The slide included in this post should explain that. You might read the post for more, or go back to that video for Andrew’s explanation.

In short, the focus here is about training efficiency. Your concern of what contributes more is addressed by the weights.


1 Like

That’s a good question :slight_smile:

According to your question, what I can see you are comparing x1 and x2, means comparing the size of the housing price in square feet to the number of bedrooms, but if you see when Professor Andrew Ng is calculating housing Price value he includes w1 and w2 for price calculation together. So one needs to understand here checking the housing price is not in comparison with size of the house only or number of bedrooms only. it is basically getting the housing price with the best of combinations where one can be higher than the other or vice versa.

Price of house is calculated together with size of house in square feet and number of bedrooms.

The feature scaling here basically does comparison with different features added together which gives you a better deal.

Like as you said people now prefer comfort more than privacy, so comfort for someone could be combination of bigger x1 and less x2 leading to better housing price or for someone it could be lesser x1(in lesser square feet) having an extra bedroom that is x2 to have a better housing price prediction.

Hope you got the point.


This housing price is famous example in multiple linear regression analysis. Here Andrew Ng only include two feature scaling, where in one of the same study we further go ahead and add number of bathroom, presence or absence of garage, Air conditioning, underground storage etc. These all factors are added and then housing price analysis is done. This example is very well explained in Statistic with SAS which is Coursera course.


I think you mean linear regression, since we’re predicting house prices, not classifications.

1 Like

Thank you for correcting I meant multiple linear aggression.

Although one can also use linear regression to model non-linear relationships with the response variable by adding polynomials, such as squared or cubed terms, or you can add interactions to your model. One should also notice even if we are adding polynomials we still have a linear model despite the exponents on the predictor variables. That’s because these polynomial models are linear in the parameters.