StandardScaler.fit_transform vs StandardScaler.transform

AbelPhilip · October 29, 2024, 1:52pm

In the third week of the second course in the MLS, in the first optional lab about model evaluation, to calculate the mean and standard deviation of the training set the function used is StandardScaler().fit_transform, whereas in the same lab to do the same operation on the validation set the function used is StandardScaler().transform. I am not able to contemplate the change in the functions, as to why different functions are used. Can someone please help me out here?

wai_yar_aung111 · October 29, 2024, 3:46pm

This difference in using .fit_transform for the training set and .transform for the validation set is a crucial aspect of data preprocessing in machine learning.

Explanation of `.fit_transform` vs. `.transform`

StandardScaler().fit_transform(training_data):
- The .fit_transform() function calculates the mean and standard deviation of the training data and then scales it based on these values.
- This ensures that the model learns the scaling parameters only from the training set.
StandardScaler().transform(validation_data):
- When applying scaling to the validation data (or any new data), you should use .transform() only.
- Using .transform() applies the same scaling parameters (mean and standard deviation) from the training set to the validation set, ensuring the model’s performance is evaluated on data scaled consistently with training.

Why This Matters

If we used .fit_transform() on the validation set, it would calculate new scaling parameters based on the validation data, introducing data leakage. This could cause inconsistencies in model evaluation since the validation set’s mean and standard deviation would differ from the training set’s, impacting model accuracy and generalizability.

AbelPhilip · October 29, 2024, 4:27pm

@wai_yar_aung111 thank you very much for the clarification. I completely missed this crucial step to use the fitted parameters from the training set on the validation set too.

TMosh · October 29, 2024, 4:54pm

Short answer:
We don’t train (i.e. fit) on the validation set.

Topic		Replies	Views
Difference between .transform and .fit_transform when feature scaling Advanced Learning Algorithms week-3	1	17	January 2, 2025
Difference between fit and fit-transform Advanced Learning Algorithms week-3	1	539	March 10, 2023
Difference between fit, fit_transform & transform Unsupervised Learning, Recommenders, Reinforcement week-2	2	39	July 8, 2024
Scaling training and cross validation set Advanced Learning Algorithms week-3	1	326	September 24, 2023
C2W3_Lab_01_Model_Evaluation_and_Selection. Using fit_transform Advanced Learning Algorithms week-3	1	395	July 20, 2023

StandardScaler.fit_transform vs StandardScaler.transform

Explanation of .fit_transform vs. .transform

Why This Matters

Related topics

Explanation of `.fit_transform` vs. `.transform`