Hi Hope you are doing well.
I am trying to figure out the leanring rate and feature scaling lab. However, I am having a hard time trying to figure out what is graphs means. I know it is sort of the the side to side comparison of unnormalized size and bedrooms against normalized ones. But, why is the graphs look different than the ones shown above, in elliptical, or circles.
Thank you so much.
Hello @Chulong_Cheng
The key is to see that normalization is not just scaling the features, but through the features, it also scale the weights.
Let’s look at a very simple example
We can easily verify that the optimal w_1 and w_2 are 3 and 2 respectively. Now, what happen if we scale x_1 by dividing it by 2 (resembling feature normalization)?
x_1^{scaled} |
x_2 |
y |
1 |
1 |
8 |
0.5 |
2 |
7 |
Then, obviously, the optimal w_1 and w_2 are changed to 6 and 2 respectively. See how “shrinking” x_1 has “enlarged” w_1, or doubled it!
This “enlargement” does not just restrict to the optimal point, but everywhere. For example, for having a cost of 5(i.e. J=5), if w_2 is fixed to 2, then before the scaling, w_1 has to be either 1 or 5. However, after the scaling, it has to be either 2 or 10. See that the difference between them gets “enlarged” from 5-1=4 to 10-2=8, which is another double.
Please feel free to check my math, and if you like, plot the contour for this simple example before and after scaling. There are just 2 data points so that maths is not too complicated.
In summary, scaling features also scales the cost surface. Scaling only one feature scales along only one dimension of the cost surface. Therefore, scaling can “reshape” (stretch or compress) how to contours look. Particularly, scaling all features by their standard derivations will make all features end up having the same scales, and thus circular contours. You might also verify it by scaling both features in the above simple examples by their standard derivation.
Cheers,
Raymond
2 Likes
Thank you so much, I will think about it.