C2W1_Assignment - modify_domain_of_features

Hi all,

When I was working on the modify_domain_of_features function in the first week’s assignment, I saw that not all the specified features had their domain updated.

My implementation:

for feature in features_list:
        tfdv.set_domain(schema=schema, feature_path=feature, domain=to_domain_name)

The features_list:

domain_change_features = ['repaglinide', 'nateglinide', 'chlorpropamide', 'glimepiride', 
                          'acetohexamide', 'glipizide', 'glyburide', 'tolbutamide', 'pioglitazone', 
                          'rosiglitazone', 'acarbose', 'miglitol', 'troglitazone', 'tolazamide', 
                          'examide', 'citoglipton', 'insulin', 'glyburide-metformin', 'glipizide-metformin', 
                          'glimepiride-pioglitazone', 'metformin-rosiglitazone', 'metformin-pioglitazone']

After running the function and displaying the schema I still get:
‘acetohexamide’ ‘No’, ‘Steady’

Even though I specified ‘metformin’ (‘Down’, ‘No’, ‘Steady’, ‘Up’) as the to_domain_name. The test code does not consider the acetohexamide feature and it says things are going fine. However, I expected it to now have ‘Down’, ‘No’, ‘Steady’, ‘Up’ in its domain.

Can someone help me understand this behavior?


1 Like

Hi @pepijn,

Thanks for your question.

First to make sure that I understand your issue better; could you please check if in the cell after running tfdv.display_schema(schema) the feature ‘acetohexamide’ is referring to the domain of ‘metformin’ like in the snip below?

If that is OK, then I guess your question is why the second part showing the Domain Values Dataframe is still only giving ‘No’ , ’ Steady’ as values for the ‘acetohexamide’ domain, right?
It was also the first time I saw this now, so thanks for highlighting!

From what I can see in the set_domain source code is that the selected feature (acetohexamide) gets assigned a new domain (metformin) , but not that the values belonging to the original acetohexamide domain established from the train_stats get overwritten with the values from the metformin domain.
As I am not entirely sure, I’d like to get a second opinion on the above; @chris.favila or anyone please?

In the display schema source code, after the features table with ‘Feature name’, ‘Type’, ‘Presence’, ‘Valency’, ‘Domain’ , it will print all domains + values with the original values still assigned to the domain:

Hopefully this answers your question.



Hi Maarten,

Thanks for your reply! My question was indeed about why the Domain Values Dataframe does not show the same values for ‘acetohexamide’ as for ‘metformin’.

It is still not completely clear to me whether this is correct behavior or not. I got the impression that the purpose of modifying the domain was to expand the allowed domain values for that particular feature. Looking at the schema the domain values have stayed the same, but there is indeed a reference to the ‘metformin’ domain.

Does that mean that acetohexamide’s original domain is no longer checked during anomaly detection? And in that case, the Domain Values Dataframe showing the old values is redundant?


1 Like

Hi Pepjin!

Maarten is right. The acetohexamide at the bottom of schema display refers to a domain name. And that domain’s values are inferred based on the train_stats which only has the two string values you mentioned. When you inferred the schema earlier, the default behavior is to map the acetohexamide feature to the acetohexamide domain.

Now when you updated the schema in this section, you revised this setting and mapped the acetohexamide feature to the metformin domain (which has more string values). The acetohexamide domain will still stay the same because we did not update it. It is still based on the values detected in the train stats. It will also be harmless because the acetohexamide feature is no longer mapped to it. It is already pointing to the metformin domain.

Hope this answers your question!


It does thanks a lot! @chris.favila @mjsmid