How to fix anomalies with tfx, not tfdv?

ivan_100096 · August 12, 2021, 9:16am

Hey,
I have a question. It related to weeks 1 and 2 for 2-d course in the specialization.

In labs, I see how to fix anomalies with tfdv
It’s an example of the code of how we can make it:

country_feature = tfdv.get_feature(schema, 'native-country')
country_feature.distribution_constraints.min_domain_mass = 0.9

But when we work with tfx, for all datasets we do not find anomalies.
My question is the next:
imagine a situation that we found anomalies in the validation dataset.
How we can fix those anomalies with tfx, not tfdv?

Thanks

chris.favila · August 13, 2021, 12:52am

Hi Ivan! You will still use TFDV to fix anomalies found in your dataset. The role of TFX is to facilitate your ML pipeline by handling how the artifacts are handled by each component. If an issue is found (i.e. your pipeline breaks), then you can use other tools to fix them. The first parts of the pipeline mostly use TFDV under the hood so you can use that library to modify its output artifacts. Afterwards, you can feed the revised artifacts to the pipeline again. You will actually have an exercise on this in Week 3 when you get to iterative schemas. Hope this helps!

Topic		Replies	Views
C2W2 - Asignement Exercise 6 - general question Machine Learning Data Lifecycle in Production	2	607	May 26, 2022
Data cleaning in TFX Machine Learning Data Lifecycle in Production	2	578	December 30, 2021
Course2: week1: Lab : C2_W1_Lab_1_TFDV_Exercise Machine Learning Data Lifecycle in Production	4	653	July 4, 2021
TFDV Domain Minimum Machine Learning Data Lifecycle in Production	1	529	February 8, 2022
TFDV: Schema for LSTM Machine Learning Modeling Pipelines in Production	1	553	July 26, 2022

How to fix anomalies with tfx, not tfdv?

Related topics