In the step-by-step there’s a series of cells (I’ve combined, and removed comments, prints, prediction code) where the poly is fit_transformed against the training set and just transformed on the cv set:
poly = PolynomialFeatures(degree=2, include_bias=False)
X_train_mapped = poly.fit_transform(x_train)
scaler_poly = StandardScaler()
X_train_mapped_scaled = scaler_poly.fit_transform(X_train_mapped)
model = LinearRegression()
model.fit(X_train_mapped_scaled, y_train )
X_cv_mapped = poly.transform(x_cv)
X_cv_mapped_scaled = scaler_poly.transform(X_cv_mapped)
Then in the model selection loop the cv is fit_transformed
for degree in range(1,11):
poly = PolynomialFeatures(degree, include_bias=False)
X_train_mapped = poly.fit_transform(x_train)
scaler_poly = StandardScaler()
X_train_mapped_scaled = scaler_poly.fit_transform(X_train_mapped)
scalers.append(scaler_poly)
model = LinearRegression()
model.fit(X_train_mapped_scaled, y_train )
models.append(model)
poly = PolynomialFeatures(degree, include_bias=False)
X_cv_mapped = poly.fit_transform(x_cv)
X_cv_mapped_scaled = scaler_poly.transform(X_cv_mapped)
vs if we follow the step by step it should be:
remove poly = line since we already fit_transformed with X_train
poly = PolynomialFeatures(degree, include_bias=False)
add
X_cv_mapped = poly.transform(x_cv)
so is it an mistake?
if not, then why not do it in the step by step but do it in the for loop?
if we compare with scaling, that’s only fit_transformed with x_train and transformed to cv (as described in the comments) in both sections.
edit: I just noticed the test set is poly fit_transformed as well so perhaps the step-by-step was incorrect by just transforming cv?
edit2:
in the neural network part cv and test are not fit_transformed, just transformed. I can’t find any logic to the different choices made in the lab.
poly = PolynomialFeatures(degree, include_bias=False)
X_train_mapped = poly.fit_transform(x_train)
X_cv_mapped = poly.transform(x_cv)
X_test_mapped = poly.transform(x_test)