Graph in optional lab : feature scaling and learning rate

Chulong_Cheng · March 2, 2023, 8:29pm

Hi Hope you are doing well.

I am trying to figure out the leanring rate and feature scaling lab. However, I am having a hard time trying to figure out what is graphs means. I know it is sort of the the side to side comparison of unnormalized size and bedrooms against normalized ones. But, why is the graphs look different than the ones shown above, in elliptical, or circles.

Thank you so much.

rmwkwok · March 3, 2023, 1:18am

Hello @Chulong_Cheng

The key is to see that normalization is not just scaling the features, but through the features, it also scale the weights.

Let’s look at a very simple example

x_1	x_2	y
2	1	8
1	2	7

We can easily verify that the optimal w_1 and w_2 are 3 and 2 respectively. Now, what happen if we scale x_1 by dividing it by 2 (resembling feature normalization)?

x_1^{scaled}	x_2	y
1	1	8
0.5	2	7

Then, obviously, the optimal w_1 and w_2 are changed to 6 and 2 respectively. See how “shrinking” x_1 has “enlarged” w_1, or doubled it!

This “enlargement” does not just restrict to the optimal point, but everywhere. For example, for having a cost of 5(i.e. J=5), if w_2 is fixed to 2, then before the scaling, w_1 has to be either 1 or 5. However, after the scaling, it has to be either 2 or 10. See that the difference between them gets “enlarged” from 5-1=4 to 10-2=8, which is another double.

Please feel free to check my math, and if you like, plot the contour for this simple example before and after scaling. There are just 2 data points so that maths is not too complicated.

In summary, scaling features also scales the cost surface. Scaling only one feature scales along only one dimension of the cost surface. Therefore, scaling can “reshape” (stretch or compress) how to contours look. Particularly, scaling all features by their standard derivations will make all features end up having the same scales, and thus circular contours. You might also verify it by scaling both features in the above simple examples by their standard derivation.

Cheers,
Raymond

Chulong_Cheng · March 3, 2023, 6:40am

Thank you so much, I will think about it.

Topic		Replies	Views
Optional Lab: Multiple Variable Linear Regression Supervised ML: Regression and Classification week-2	5	510	August 14, 2022
Feature scaling week 2 supervised learning Supervised ML: Regression and Classification week-2	2	292	December 2, 2023
About gradient descent and Features scaling Supervised ML: Regression and Classification week-2	6	553	August 19, 2022
C1 w2 ML specialization - Option Lab - Multi Variable Linear Regression Supervised ML: Regression and Classification week-2	9	446	July 3, 2023
Is my understanding of Feature Scaling correct? Supervised ML: Regression and Classification week-2	3	526	August 12, 2022

Graph in optional lab : feature scaling and learning rate

Related topics