Interpreting the benefits of feature scaling

rmwkwok · February 8, 2023, 10:46pm

I think @Juan_Olano has provided us a very good example to illustrate the idea behind scaling. Here, the Pricing spans over a range of 900K and has an order of magnitude of 10^5, whereas the Rooms only 4 with an order of magnitude of 10^0.

When we compute the gradient

\frac{\partial{J}}{\partial{w}} = \sum_i (error_i \times x_i),

while error_i is common to all features, x_i is the real factor that makes the scale of the gradient of one feature different from another feature. For Pricing, its scale is on average 10^5, and for Rooms it is 10^0, so we can expect for a 10^{(5-0)} difference in the order of magnitude in the gradients as well. Given that we use the same learning rate for all features, we should expect to see the weight for Pricing to change more dramatically than Rooms.

In your “old note”, you have talked about “Flat normalization” and “Column-wise normalization”. I think all of us are referring to the latter in this discussion.

What do you think?

Raymond

Topic		Replies	Views
Optional Lab: Feature Engineering and Polynomial Regression (Feature scaling impact on Convergence) Supervised ML: Regression and Classification week-2	2	531	September 2, 2022
Is my understanding of Feature Scaling correct? Supervised ML: Regression and Classification week-2	3	526	August 12, 2022
C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln cost Supervised ML: Regression and Classification week-2	3	223	February 21, 2024
Disadvantages of feature scaling Supervised ML: Regression and Classification week-2	3	737	July 17, 2023
C1_W2_Course_Feature_Scaling Supervised ML: Regression and Classification week-2	3	515	July 3, 2022

Interpreting the benefits of feature scaling

Related topics