C2W3 Lab01 poly fit_transform CV no then yes

In the step-by-step there’s a series of cells (I’ve combined, and removed comments, prints, prediction code) where the poly is fit_transformed against the training set and just transformed on the cv set:

poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_mapped = poly.fit_transform(x_train)

scaler_poly = StandardScaler()
X_train_mapped_scaled = scaler_poly.fit_transform(X_train_mapped)
model = LinearRegression()
model.fit(X_train_mapped_scaled, y_train )

X_cv_mapped = poly.transform(x_cv)
X_cv_mapped_scaled = scaler_poly.transform(X_cv_mapped)

Then in the model selection loop the cv is fit_transformed

for degree in range(1,11):

poly = PolynomialFeatures(degree, include_bias=False)
X_train_mapped = poly.fit_transform(x_train)

scaler_poly = StandardScaler()
X_train_mapped_scaled = scaler_poly.fit_transform(X_train_mapped)
scalers.append(scaler_poly)

model = LinearRegression()
model.fit(X_train_mapped_scaled, y_train )
models.append(model)

poly = PolynomialFeatures(degree, include_bias=False)
X_cv_mapped = poly.fit_transform(x_cv)
X_cv_mapped_scaled = scaler_poly.transform(X_cv_mapped)

vs if we follow the step by step it should be:

remove poly = line since we already fit_transformed with X_train

poly = PolynomialFeatures(degree, include_bias=False)

add

X_cv_mapped = poly.transform(x_cv)

so is it an mistake?
if not, then why not do it in the step by step but do it in the for loop?
if we compare with scaling, that’s only fit_transformed with x_train and transformed to cv (as described in the comments) in both sections.

edit: I just noticed the test set is poly fit_transformed as well so perhaps the step-by-step was incorrect by just transforming cv?

edit2:
in the neural network part cv and test are not fit_transformed, just transformed. I can’t find any logic to the different choices made in the lab.

poly = PolynomialFeatures(degree, include_bias=False)
X_train_mapped = poly.fit_transform(x_train)
X_cv_mapped = poly.transform(x_cv)
X_test_mapped = poly.transform(x_test)

Sorry, I am confused by the formatting in your post. I’m not able to identify exactly what the question is.

You don’t need to copy parts of the notebook here, that just makes it difficult to read, since we then have to manually compare it with the code in the notebook itself.

It would be better if you ask your question and let us figure out how it fits into the notebook.

I’ve tried to highlight the problematic code in bold. I had hoped my effort would make things easier :slight_smile:

The workbook uses:

  • a poly transform on the cv set in the initial step by step part.
  • a poly fit_transform in the for loop part
  • uses a poly fit_transform on the test set
  • uses a poly transform on the cv and test sets in the neural network part

There seems to be inconsistency in how the cv and test sets are treat with polynomial expansion. Sometimes it’s a fit_transform and sometimes it’s just a transform.

Scaling on the other hand is consistent: cv and test sets are always transformed and never fit_transformed and this is explained in the workbook. There is no such distinction made for polynomial expansion.

Hope this is clearer.

Hello @ramz,

So I think the idea about your message is that, when we used “StandardScalar”, we only “fit_transform” the training set, and only “transform” the other sets and never “refit” to them. However, when we used “PolynomialFeatures”, we had “fit_transform” both the training set and the other sets. Therefore, it can make us feel that the way we used “StandardScalar” and “PolynomialFeatures” were inconsistent.

If my summary is right, then:

  1. first, let’s consolidate this: we can only fit the “StandardScalar” to the training set, because it learns parameters and we want to reuse those parameters in transforming the other sets.

  2. however, it’s fine to fit a new “PolynomialFeatures” to a set each time, because it does NOT learn any parameter. Its transformation behavior is solely determined by the arguments when you call it, i.e. (degree, include_bias=False). Therefore, as long as the arguments are the same, whether you fit a new “PolynomialFeatures” to a set each time, or you transform a set with the one previously fitted on the training set, the final outcome won’t change. You can verify this easily.

  3. Having said that the final outcome won’t change which means that we have a choice, will I use “PolynomialFeatures” like the current way in the lab OR like the way the lab uses “StandardScalar”? My choice is the latter.

Therefore, if we understand what “StandardScalar” and “PolynomialFeatures” rely on to do their transformations, we will realize that the way the lab uses them won’t cause any trouble. Moreover, if it is our practice to fit once and only transform later, then sticking to that practice won’t change the lab’s outcome either.

Cheers,
Raymond

2 Likes

Hi Raymond, thanks for your reply. Your summary is pretty much on, just that I would amend your sentence ‘However, when we used “PolynomialFeatures”, we had “fit_transform” both the training set and the other sets’ to more like when the lab used PolynomialFeatures we always had fit_transform on the training set and then it was (potentially) random as to whether we used fit_transform or transform on the other sets - there wasn’t any consistency :slight_smile:

You’re right, we can verify this by adjusting the code for the different cases.
I’m relieved :relieved: to hear (if I understand correctly) tha you have a preference for the fit_transform, then transform (like the Scaler is used).

If I understand correctly you are saying:
because ‘transform’ uses the arguments from its previous ‘fit_transform’
then the behaviour of a new ‘fit_transform’ is the same as ‘transform’ if the new ‘fit_transform’ uses the same arguments as the previous ‘fit_transform’?

Hello @ramz,

Yes, and this is limited to “PolynomialFeatures”.

Cheers,
Raymond

yes, my bad for leaving that part out. Thank you!

I found the discussion here very helpful because I had the same question. But for anyone who also is coming to this thread in the future with the same question in mind, I think what @rmwkwok 's remarks of

" However, when we used “PolynomialFeatures”, we had “fit_transform” both the training set and the other sets. Therefore, it can make us feel that the way we used “StandardScalar” and “PolynomialFeatures” were inconsistent. "

actually should be

" In the lab, when it uses “PolynomialFeatures”, the lab “fit_transform” the training set and “transform” the other sets, just like how it does for "StandardScalar” . Since PolynomialFeatures solely depends on the degree input, there is no parameters to pass to the cv and test dataset. So it is really confusing for us to see that the way the lab uses “PolynomialFeatures” and “StandardScalar” are the same”

  • I had to read the post twice to figure out what is what because of this part. Hopefull this clears things up more for future questioner. Regardless, thanks so much to @rmwkwok for his great answer!

Hi @Paige_Yang,

Thanks for bringing this up. Now looking back, maybe it would be simpler to put it this way (incoporating what you have written)

Given:
StandardScalar learns some parameters (means and variances) from the training set.
PolynomialFeatures does not learn anything from the training set.

Then:
For StandardScalar, we need to fit_transform the training set to learn the parameters, and then transform test set. The test set is transformed based on the parameters learnt in the training set.

For PolynomialFeatures, no parameters will ever be learnt. The following two choices make no difference in the transformation results:

  1. fit_transform both training and test set, or
  2. fit_transform the trainnig set and transform the test set.

Cheers,
Raymond

1 Like