Full pass in production

jax79sg · June 15, 2021, 12:05pm

Hi. In what circumstances should be be performing full pass in production and in what scale? For example, do we need to perform a min/max based on past 10 hours of data, or can’t we do that based on the stats from the training set?

luigisaetta · June 16, 2021, 7:07am

Hi @jax79sg
Normally you do a full pass only on training data and then you use the “transformation” parameters captured during the training phase.
In production normally you process data a batch a time and the batch could be not so big, therefore, for example, min/max computed on the batch couldn’t be really significant.
The good things about TFX is that it captures all the info and then can applies to serving time.

jax79sg · June 19, 2021, 8:19am

Ok thanks. So essentially its wisest to use the ‘constants’ deriving in training set and apply these to production data.

luigisaetta · June 20, 2021, 7:41am

@jax79sg
Yes.
Basically, the training set is very often much bigger than serving batch, and therefore numbers used for normalization, to give an example, are much likely to be appropriate.
But, during serving time, we should monitor to identify that there is not a data drift. IN that case you need to investigate, eventually, confirm and then take appropriate action like redefine the transform steps and restrain the model.
Today this well-organized and monitored pipeline often is not implemented… this is why all the discussions we’re doing in this specialization are so important… there is the need for awareness around these subjects… then it takes time and effort to implement these approaches.

Happy learning