My question pertains to week 2, C1_W2_Lab04_FeatEng_PolyReg_Soln, “Scaling Features” example
(/optional-lab-feature-engineering-and-polynomial-regression/lab)
The premade function solution comes out as w[1] = 1.13e+02 and b as 123.5, which seems absurd at first glance but makes more sense I think about it. I’ve run the entire notebook several times, and this is indeed the output.
I want to be sure here: intuitively put, this means that we could think of the price as being about 113 times the #stdev of x^2 plus that bias of 123.5? 2. Are there particular scenarios in which z-score scaling stands out over the others with justification?
When we scale like this our weights appear to become meaningless in the context of the original input, and so in practice we would re-scale again after training?
For simple linear model covered in C1 W2, the choice does not matter. Thing that matters is to do scaling because it helps gradient descent to work better.
Depends on purpose. To make predictions, there is no need to scale back - the usual steps are to scale features as training and use trained weights as is.
Scaled features are unit-less, and so are the associated weights (if the label is unitless, too). To give the units back to the weights, we scale back. What’s worth discussing is, what do we expect from these more “meaningful” weights?