Feature Scaling In Production Systems

Shern · February 26, 2024, 4:31pm

I have a question about how scaled features are served in applications/production environments.

Lets say I have a feature “F”. As part of my model training process, assume i have scaled feature F using z-score normalization. I recall from the class that feature scaling can speed up the training process as it reduces the number of iterations gradient descent needs to find the local/global minima. Lets say the “best” model was found and saved.

What would the next steps typically look like?

Serve feature F in some feature store, where F is maintained as the scaled version
Retrain the model on the unscaled version of feature F. Feature F is then served in a feature store as its original form

For (1), is the mean and standard deviation computed on the fly everytime? What is the best practice here? Or do we not do (1) but rather do (2)? Thanks for reading!

TMosh · February 26, 2024, 4:37pm

would be redundant. You already trained using the normalized data, there’s no reason to also train on the original data.

Shern · February 26, 2024, 4:46pm

I guess my main concern is that (in my current organization), there is a strict service level agreement whereby my model has to return a response within some X duration.

When a system calls my model, all the features are generated via a feature store. I am unsure if computing normalized features every single call will violate the SLA.

Feature scaling does not explicitly improve model performance iirc. If this is the case, i guess it would be a good tradeoff to train a model on unnormalized features (even if it is redundant) just so my feature store computes faster?

Curious to know if youd come to the same conclusion as me

TMosh · February 26, 2024, 6:08pm

The only reason to re-train a model is if you are adding new training examples.

So, you can save the model (the weights and biases), so you can use them to make new predictions. You’ll also need to save the normalization parameters (the mean and sigma, or whatever normalization you used in training), so that you can apply the same normalization to any new predictions you want to make.

If you want to re-train, you might batch up the new examples for a while, and then re-train the model periodically.

Exactly what strategy you use depends on how big the data set is, now much computing power it takes for training, how often you get new training data, and how up-to-date you want the models to be for making new predictions.

Recommendation: Do not train your model on un-normalized data. The benefits of normalization are many, and the penalties for not using normalized features can be extreme, unstable, and extremely difficult to debug.

rmwkwok · February 27, 2024, 2:04am

Hello @Shern,

Sometimes, without feature scaling, during training, your model never converges to any minimum, so this is not about speed anymore. Given this possibility in mind, we should agree that (2) is not always possible?

Personally, I would keep F in its original state, because F can serve more than one model and/or one version of model. If these models are supplied with a different subset of F, then obviously, they require F to be scaled differently. In this case, what is the easiest way of maintaining them without having one copy of F per model?

Then I am afraid you just need to find out.

Z-score scaling takes one subtraction and one division. If your model is already large, those two operations might just be nothing.

Cheers,
Raymond

Shern · February 27, 2024, 5:51am

@rmwkwok @TMosh thank you both for your replies, i have better understanding now

Topic		Replies	Views
How to implement the feature scaling in prediction? Supervised ML: Regression and Classification week-2	1	524	June 23, 2022
Optional Lab: Feature Engineering and Polynomial Regression (Feature scaling impact on Convergence) Supervised ML: Regression and Classification week-2	2	532	September 2, 2022
Can Feature Scaling be applied for test set? Supervised ML: Regression and Classification week-2	7	566	October 28, 2022
Is my understanding of Feature Scaling correct? Supervised ML: Regression and Classification week-2	3	528	August 12, 2022
Week 2 Feature Scaling - Question Supervised ML: Regression and Classification week-2	8	620	July 30, 2022

Feature Scaling In Production Systems

Related topics