Model deployment in the real world question ... data preprocessing

Lars_UK · March 1, 2024, 2:34pm

Hi all,
New here.
I have done some TensorFlow training over the few years but have never had a chance to work on a real project until recently.
Now most of the ML training seems to stop at building the model and checking the loss value… but not actually deploying it and using real data, hence my question.
I am working on a model on water pollution. The input data contains both numerical data as well as locations, categories etc. The data has been scaled and categories encoded. So far so good. Now if I want to test my model with sample data, do I need to prepare it the same way (i.e scale it and encode categories etc the same way)? I assume so but this is a topic usually not covered in training. How is this done with production applications?
Many thanks

TMosh · March 1, 2024, 6:29pm

Yes. This is because the model was trained on pre-processed data, so its predictions are only going to work on data that was pre-processed in a compatible way.

For example, if you normalized the training data, you need to apply the same normalization to the test data. That doesn’t mean you normalize the test data independently separately - you apply the normalization you got from the training set.

Lars_UK · March 4, 2024, 9:19am

Thanks @TMosh ,

Ok, so it means that when you deploy your model in prod, you need to have a pre-processing layer which is exactly the same as the one you used in developing your model, correct?
So let’s say I need to build an app that will try to predict the likelihood of a water pollution event whenever a water spillage is reported/detected. Each time I create a new record in my app I need to normalise its data for sending to the model…

TMosh · March 4, 2024, 3:58pm

Correct.

Topic		Replies	Views
Normalizing input in training vs production Improving Deep Neural Networks: Hyperparameter tun	3	513	March 17, 2023
Data preprocessing and pretraining AI for Medical Diagnosis week-3	1	576	February 4, 2022
Normalization before a new prediction? Supervised ML: Regression and Classification week-2	3	529	August 27, 2022
I Need to understand models better (save to a database) AI Discussions	2	50	March 10, 2022
Training strategy as more real data becomes available Structuring Machine Learning Projects	3	592	May 13, 2021

Model deployment in the real world question ... data preprocessing

Related topics