Normalizing input in training vs production

apodgorny · March 17, 2023, 1:45pm

If we use the benefits of normalizing input in the training/testing phase, this would mean, that we need to adopt the input in production as well, since the model is trained with modified data.

Since normalisation depends on the corpus of training data (to calculate mean etc.), we need to use the parameters derived from training data to adjust what model receives in production as well.

What are the best practices of doing so?
Is there anything in TensorFlow or Keras, that helps with this?

Juan_Olano · March 17, 2023, 1:53pm

You are right: the pre-processing that you apply to your train/test dataset has to be also applied to your production data.

If you, for instance, impute using mean() in training/test, you would need the same type of imputation for production.

If you normalize with min-max in train/test, you need to do the same in production for inference.

If you use one-hot encoding for categorical features, you need to use the same technic in production.

And so on…

Now, how to do this pre-processing in production? it depends on the framework you use. Tensorflow and torch will offer you options to pre-process the data when you pack your model for production. To give you a quick example, the following method gets a model for inference:

def get_inference_model(model):
inputs = tf.keras.Input((maxLen, 1629), dtype=tf.float32, name=“inputs”)
for i in range(1, len(model.layers)):
if i == 1:
x = model.layersi
else:
x = model.layersi
x = tf.reduce_mean(x, axis=0)
output = tf.keras.layers.Activation(activation=“linear”, name=“outputs”)(x)
inference_model = tf.keras.Model(inputs=inputs, outputs=output)
inference_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics=[“accuracy”])
return inference_model

In this case, I did some normalization with mean() for the training/test. Here, when creating the model for inference, I added a line to apply the same pre-processing for the production data:

x = tf.reduce_mean(x, axis=0)

This can be much more complex than this, and each framework will give you the options to do so.

Hope this helps.

Juan

Juan_Olano · March 17, 2023, 1:56pm

I wanted to add another example of data preprocessing while in inference:

def get_inference_model(model):
inputs = tf.keras.Input((543, 3), dtype=tf.float32, name=“inputs”)

vector = tf.image.resize(inputs, (CFG.sequence_length, 543))
vector = tf.where(tf.math.is_nan(vector), tf.zeros_like(vector), vector)
vector = tf.expand_dims(vector, axis=0)

vector = model(vector)
output = tf.keras.layers.Activation(activation="linear", name="outputs")(vector)
inference_model = tf.keras.Model(inputs=inputs, outputs=output) 
inference_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics=["accuracy"])
return inference_model

This example presents a very interesting feature of tensorflow:

tf.where(…)

It allows you to add conditionals when doing preprocessing. Check this out HERE

Another powerful command you can find in tensorflow is ‘cond’. You can read about it HERE. With this command you can effectively create conditions to pre process your data in one way or another, supported by functions. And the nice thing about it is that it would all be packed in your inference model.

apodgorny · March 17, 2023, 2:22pm

Thank you for your examples. I assumed something like this is possible, but it is good to have an example at hand.

Topic		Replies	Views
Confusion with Input normalization and batch normalization Improving Deep Neural Networks: Hyperparameter tun	3	607	January 22, 2022
Model deployment in the real world question ... data preprocessing AI Discussions ai-discussions , tensorflow	3	148	March 4, 2024
Batch Normalization vs Feature Input Normalization Improving Deep Neural Networks: Hyperparameter tun	3	638	May 24, 2021
Reason for Batch normalization at Test Time Improving Deep Neural Networks: Hyperparameter tun ai-discussions	4	479	January 23, 2024
Input data normalization Improving Deep Neural Networks: Hyperparameter tun	2	703	July 12, 2022

Normalizing input in training vs production

Related topics