In the Model Evaluation and Selection lab, there are two different functions used by the notebook author when feature scaling using scikit-learn’s StandardScalar(). Those two functions are .transfrom(x_train) and .fit_transform(x_train). What is the difference between their functions?
Additionally, when the notebook author uses the PolynomialFeatures() object, they also use a .fit_transform(x_train) function. Does that function do the same thing in PolynomialFeatures() and StandardScalar(), or are they different?
Below, you can find clarifications for fit_transform and transform methods; I am not sure about the other functions because I haven’t gone through this course.
.fit_transform(x_train)
:
- This method is a combination of two steps:
.fit()
and .transform()
.
.fit(x_train)
: This step computes the necessary statistics or parameters from the data (e.g., mean and standard deviation for scaling, or the unique categories for encoding). It essentially “learns” from the data.
.transform(x_train)
: After fitting, this step applies the transformation to the data using the learned parameters.
.fit_transform(x_train)
: By combining these two steps, it performs both fitting and transforming in one go, which is often more convenient and efficient when you want to transform the training data.
.transform(x_train)
:
- This method is used to apply a transformation to the data using the parameters that have already been learned with
.fit()
.
- It does not compute or learn anything new; it simply uses the existing parameters to transform the data.
- You would typically use
.transform()
on new data (e.g., validation or test sets) after you have already fitted the transformer on the training data.