Can Feature Scaling be applied for test set?

cs_Chinmay · October 27, 2022, 12:08pm

Hi,
I had one doubt related to feature scaling.
I have a binary classification problem, for which I implemented mean normalization on the training set. The test set is untouched, i.e., it is raw.

After training the ML model, I want to generate predictions for the test set and see how the model performs. When I feed the untouched test dataset directly to the model, I get a low accuracy score (about 0.75). On the other hand, if I first apply feature scaling using the same scaler object defined in the training step on the test dataset, I achieve a better accuracy score (about 0.88).

My question is whether feature scaling needs to be applied on the test set or not? In other words, should I judge a model’s performance without scaling the test data?

AbdElRhaman_Fakhry · October 27, 2022, 12:42pm

Hi @cs_Chinmay
Yes you want to do feature scaling in your data to make a good prediction and reduce different distributions

Please feel free to ask any questions,
Thanks,
Abdelrahmam

TMosh · October 27, 2022, 12:55pm

After you normalize the training set, you have to apply the same normalization to the test set.

TMosh · October 27, 2022, 1:00pm

@tazet, I’m going to delete your reply, because I think it refers to a different course (not MLS). It also referred to dropout, but the topic of this thread is feature scaling.

cs_Chinmay · October 27, 2022, 1:12pm

Thanks for the response. So, does this mean that if the model goes into production, whatever new data comes, it will get normalized by the scaler object defined during the training step, right?

TMosh · October 27, 2022, 1:27pm

Yes. Because the weight values you learned presumed that the data had a specific normalization.

You can’t use those weights to make predictions unless you apply the same normalization to the new data.

Moaz_Elesawey · October 27, 2022, 4:53pm

@cs_Chinmay

As all had said here. I want to add some minor things. which is you need to apply the normalization or the features scaling to the test set with the exact parameters of the train set and never invoke the test set at any calculation.

for example, if your goal is to standardize the data you have like make it have a mean of 0 and std of 1 what you will do is \displaystyle{x = \frac{x-\mu}{\sigma}}

## your train set
x_train = ...

## your test set
x_test = ...

mu = np.mean(x_train)
sigma = np.std(x_train)

## transform the x_train
x_train = (x_train - mu) / sigma

## transform the x_test
x_test = (x_test - mu) / sigma

see here we use the mu and sigma of the train set not the test set. this is because we said before that the test set we don’t know anything about and never get invoked at any tuning and improving our ML model.

Bichitra_Panda · October 28, 2022, 12:30pm

Hi @cs_Chinmay .
Yes, You must normalise the test data to the same range as you do to the training set. It is necessary to do so to get correct prediction/ for better performance.
As you have mentioned, you must use the same scalar for scaling feature of test set that you used for training set. It helps you with data leakage during testing phase.

Topic		Replies	Views
How to implement the feature scaling in prediction? Supervised ML: Regression and Classification week-2	1	524	June 23, 2022
C1_W2 - Feature Scaling for Unbounded Data Points Supervised ML: Regression and Classification week-2	2	507	August 17, 2022
Feature Scaling In Production Systems Supervised ML: Regression and Classification week-2	5	242	February 27, 2024
Week 2 Feature Scaling - Question Supervised ML: Regression and Classification week-2	8	620	July 30, 2022
Input feature scaling Supervised ML: Regression and Classification week-2	2	114	June 1, 2024

Can Feature Scaling be applied for test set?

Related topics