Normalizing input in training vs production

If we use the benefits of normalizing input in the training/testing phase, this would mean, that we need to adopt the input in production as well, since the model is trained with modified data.

Since normalisation depends on the corpus of training data (to calculate mean etc.), we need to use the parameters derived from training data to adjust what model receives in production as well.

What are the best practices of doing so?
Is there anything in TensorFlow or Keras, that helps with this?

You are right: the pre-processing that you apply to your train/test dataset has to be also applied to your production data.

If you, for instance, impute using mean() in training/test, you would need the same type of imputation for production.

If you normalize with min-max in train/test, you need to do the same in production for inference.

If you use one-hot encoding for categorical features, you need to use the same technic in production.

And so on…

Now, how to do this pre-processing in production? it depends on the framework you use. Tensorflow and torch will offer you options to pre-process the data when you pack your model for production. To give you a quick example, the following method gets a model for inference:

def get_inference_model(model):
inputs = tf.keras.Input((maxLen, 1629), dtype=tf.float32, name=“inputs”)
for i in range(1, len(model.layers)):
if i == 1:
x = model.layersi
else:
x = model.layersi
x = tf.reduce_mean(x, axis=0)
output = tf.keras.layers.Activation(activation=“linear”, name=“outputs”)(x)
inference_model = tf.keras.Model(inputs=inputs, outputs=output)
inference_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics=[“accuracy”])
return inference_model

In this case, I did some normalization with mean() for the training/test. Here, when creating the model for inference, I added a line to apply the same pre-processing for the production data:

x = tf.reduce_mean(x, axis=0)

This can be much more complex than this, and each framework will give you the options to do so.

Hope this helps.

Juan

1 Like

I wanted to add another example of data preprocessing while in inference:

def get_inference_model(model):
inputs = tf.keras.Input((543, 3), dtype=tf.float32, name=“inputs”)

vector = tf.image.resize(inputs, (CFG.sequence_length, 543))
vector = tf.where(tf.math.is_nan(vector), tf.zeros_like(vector), vector)
vector = tf.expand_dims(vector, axis=0)

vector = model(vector)
output = tf.keras.layers.Activation(activation="linear", name="outputs")(vector)
inference_model = tf.keras.Model(inputs=inputs, outputs=output) 
inference_model.compile(loss=tf.keras.losses.SparseCategoricalCrossentropy(), metrics=["accuracy"])
return inference_model

This example presents a very interesting feature of tensorflow:

tf.where(…)

It allows you to add conditionals when doing preprocessing. Check this out HERE

Another powerful command you can find in tensorflow is ‘cond’. You can read about it HERE. With this command you can effectively create conditions to pre process your data in one way or another, supported by functions. And the nice thing about it is that it would all be packed in your inference model.

Thank you for your examples. I assumed something like this is possible, but it is good to have an example at hand.