Dealing target in training and serving in TFX

I’m new in MLOps and trying to figure out how to work with target feature in data. I read that for the uniformity of the data it is necessary to use the same schema for both the training and validation sets. My question is: How can I mark the target as optional for the validation set? If I received data from users that will not have a target feature, how do I compare the schemas of the new data (without label) with the original schema that contains target by example_validator? I know that it can be done by tfdv.get_feature(schema, 'labels').not_in_environment.append('SERVING'), but as far as I know it is not solution for production pipeline. Another thoughts: using preprocessing function in Transform delete target from validation set, but I don’t really understand how context.pipeline.stage works. Example:

if tft.TFTRuntimeContext().context.pipeline.stage == 'train':
    #transform target  for training set
else:#for validation set delete target 


I tried handle this by preprocessing function, but I need to compare schemas from original data and data from user for prediction (without label) BEFORE making transformation. So, this cause error in example_validator. All way is: ExampleGen - StatisticGen- SchemaGen - ExampleValidator - Transform - (and stage for model).

What exactly do you mean by “label feature”?

In machine learning, typically the word ‘feature’ implies it is an input to the model, the word ‘label’ refers to a output category.

I mean matrix Y,target value, sorry for the confusion. I found a way to mark target in schema as “optional” and delete target from validation set (that I got from ExampleGen) in preprocess function, that feed into Transform module. But I am not sure, that this is good approach.
I know that all this transformation can be done more easily with TFDV, but my goal is to understand how to built good pipelines with TFX.

Thanks for the details. I don’t have any further info on this topic.

1 Like