How to fix anomalies with tfx, not tfdv?

Hey,
I have a question. It related to weeks 1 and 2 for 2-d course in the specialization.

In labs, I see how to fix anomalies with tfdv
It’s an example of the code of how we can make it:

country_feature = tfdv.get_feature(schema, 'native-country')
country_feature.distribution_constraints.min_domain_mass = 0.9

But when we work with tfx, for all datasets we do not find anomalies.
My question is the next:
imagine a situation that we found anomalies in the validation dataset.
How we can fix those anomalies with tfx, not tfdv?

Thanks

Hi Ivan! You will still use TFDV to fix anomalies found in your dataset. The role of TFX is to facilitate your ML pipeline by handling how the artifacts are handled by each component. If an issue is found (i.e. your pipeline breaks), then you can use other tools to fix them. The first parts of the pipeline mostly use TFDV under the hood so you can use that library to modify its output artifacts. Afterwards, you can feed the revised artifacts to the pipeline again. You will actually have an exercise on this in Week 3 when you get to iterative schemas. Hope this helps!

1 Like