C2W3_Lab_01_Model_Evaluation_and_Selection - feature scaling

oldwardrober · July 11, 2024, 2:42pm

In the optional lab “Model Evaluation and Selection” we performed feature scaling after splitting the dataset into train, cv and test subsets. Therefore it is explained that it’s necessary to use mean and sd for train data to scale cv and test subsets. Wouldn’t it be easier to scale all data before splitting it? Or is there a reason why we perform scaling after splitting the data?

Alireza_Saei · July 11, 2024, 5:12pm

Hi @oldwardrober,

Scaling the data after splitting it into training, CV, and test ensures that the mean and standard deviation are derived solely from the training data, preventing data leakage. This approach simulates a real-world scenario where future data is unseen.

Hope it helps!

Topic		Replies	Views
Normalisation/feature scaling Advanced Learning Algorithms week-module-2	1	501	July 4, 2022
Scaling training and cross validation set Advanced Learning Algorithms week-module-3	1	327	September 24, 2023
Feature Scaling: Why don't we feature scale the training, cross validation and test data seperately? Advanced Learning Algorithms week-module-3	4	37	September 17, 2024
Can Feature Scaling be applied for test set? Supervised ML: Regression and Classification week-module-2	7	570	October 28, 2022
C1_W2_Lab03_Feature_Scaling_and_Learning_Rate_Soln - normalizing the testing data Supervised ML: Regression and Classification week-module-2	6	519	July 14, 2022

C2W3_Lab_01_Model_Evaluation_and_Selection - feature scaling

Related topics