Question about append value to schema

Zihao_Geng · December 29, 2021, 6:02pm

I am a little confused about the code below

tfdv.get_domain(schema, 'feature_column_name').value.append('string')

When we append value to the schema, what exactly are we appending? Where does the appended value come from?

Thank you!

Matthieu_Lienart · December 30, 2021, 10:47am

@Zihao_Geng my understanding is the following:

Appending value is used for categorical data
The value is appended to the schema.
The value mainly comes from your field knowledge

Let’s say in the train dataset, for whatever reason, for a categorical data you only have values “red” and “green”. So when you generate the schema, it will only list these 2 values as possible values. And when you assign number to those categories it will only prepare 2 possible values [0, 1].
But as you know that it can take values “red”, “green” and “blue”, you can add “blue” to the schema so you won’t run into issues when “blue” values show up.

I hope this helps

Topic		Replies	Views
C2_W1_Lab_1_TFDV_Exercise Machine Learning Data Lifecycle in Production	1	565	July 1, 2021
Week 1 Assignment Machine Learning Data Lifecycle in Production	1	553	July 20, 2022
TypeError: schema is of type str, should be a Schema proto Machine Learning Data Lifecycle in Production	4	517	January 11, 2023
C2_W1_Lab1_TFDV_Exercise - Doubt Machine Learning Data Lifecycle in Production	1	532	February 5, 2022
Data Validation C2W1 Exercise 8_Assignment Machine Learning Data Lifecycle in Production	4	607	August 14, 2021

Question about append value to schema

Related topics