Scaling training and cross validation set

Cristhian_David_Pere · September 24, 2023, 12:05am

In the W3 first lab why is scaling applied only to the training set and then the mean and standard deviation values are used in the cross-validation set, instead of scaling the entire dataset (train + test + cross) and using the mean and standard deviation values for all partitions?

rmwkwok · September 24, 2023, 1:00am

Hello @Cristhian_David_Pere,

Though it is not usual to hear, you might consider those scaling factors part of your trained model - that they shouldn’t be contributed by anything but the training set which makes the trained model. When you pick the best model with the cross validation, you also pick the corresponding scaling factors.

Besides, keeping the validation and test data out mimics production data which shouldn’t be available to your model training process.

Cheers,
Raymond

Topic		Replies	Views
Cross-validation set appears to undergo independent (from training set) scaling in optional lab Advanced Learning Algorithms week-module-3	1	23	August 28, 2024
Feature Scaling: Why don't we feature scale the training, cross validation and test data seperately? Advanced Learning Algorithms week-module-3	4	45	September 17, 2024
C2W3_Lab_01_Model_Evaluation_and_Selection - feature scaling Advanced Learning Algorithms week-module-3	1	26	July 11, 2024
C2_W3_Lab 1_Model evaluation & Selection_Query Advanced Learning Algorithms week-module-3	2	330	September 14, 2023
StandardScaler.fit_transform vs StandardScaler.transform Advanced Learning Algorithms week-module-3	3	59	October 29, 2024

Scaling training and cross validation set

Related topics