Hello @ramz,
So I think the idea about your message is that, when we used “StandardScalar”, we only “fit_transform” the training set, and only “transform” the other sets and never “refit” to them. However, when we used “PolynomialFeatures”, we had “fit_transform” both the training set and the other sets. Therefore, it can make us feel that the way we used “StandardScalar” and “PolynomialFeatures” were inconsistent.
If my summary is right, then:
-
first, let’s consolidate this: we can only fit the “StandardScalar” to the training set, because it learns parameters and we want to reuse those parameters in transforming the other sets.
-
however, it’s fine to fit a new “PolynomialFeatures” to a set each time, because it does NOT learn any parameter. Its transformation behavior is solely determined by the arguments when you call it, i.e.
(degree, include_bias=False)
. Therefore, as long as the arguments are the same, whether you fit a new “PolynomialFeatures” to a set each time, or you transform a set with the one previously fitted on the training set, the final outcome won’t change. You can verify this easily. -
Having said that the final outcome won’t change which means that we have a choice, will I use “PolynomialFeatures” like the current way in the lab OR like the way the lab uses “StandardScalar”? My choice is the latter.
Therefore, if we understand what “StandardScalar” and “PolynomialFeatures” rely on to do their transformations, we will realize that the way the lab uses them won’t cause any trouble. Moreover, if it is our practice to fit once and only transform later, then sticking to that practice won’t change the lab’s outcome either.
Cheers,
Raymond