What is the association between domain and features in ML and pandas dataframe?

Hi

I am a little confused about the definition of domain and features in the course. Can anyone provide definition of explanation about what is domain and the association with features? Thanks !

Hi Chen Chang,

Welcome to the course!

The features are all the input variables X which are used to make a prediction of the output variable Y in the ML model.

Every feature has a domain property and it describes roughly the range of values the feature covers; in TFX the domain gets first defined during training by the SchemaGen component.
For numerical features the domain becomes the range between the lowest and the highest value of the feature in the dataset being considered.
For categorical features the domain is a collection of all categories observed in the dataset.
As you can imagine the domain depends very much on the dataset, so it can be updated over time and curated manually to widen the range or include more categories.

Hopefully this clarifies.

Regards
maarten

2 Likes

Sometimes the term " domain" makes me thinking of domain knowledge !
Thanks a lot ! Now I understand the term ‘domain’ in this course and tensorflow package !

1 Like