Question about Course 2 Week 1 Exercise 4

Anyone able to help me figure out the first part of the coding in Course 2 Week 1 Exercise 4? I’ve tried multiple combinations already attempting to follow the instructions, but the instructions are not very clear.

Generate evaluation dataset statistics

HINT: Remember to use the evaluation dataframe and to pass the stats_options (that you defined before) as an argument

eval_stats = tfdv.generate_statistics_from_dataframe(train_stats, stats_options= train_stats)

I think you are close with a few adjustments to consider…

  1. Back in Section 2, we splitted the dataset into three dataframes. One of them is the Evaluation dataframe, and you might want to consider using it.

  2. Back in Section 3, we Instantiated a StatsOptions class and defined the feature_whitelist property. You might want to try using this object in generating the eval_stats.

  3. Check out the API call for tfdv.generate_statistics_from_dataframe at tfdv.generate_statistics_from_dataframe  |  TFX  |  TensorFlow. It will tell you what data structure it is expecting for the arguments.

What am I still missing here for Week 2 Exercise 5?


# HINTS: Pass the statistics and schema parameters into the validation function 
anomalies = tfdv.validate_statistics(eval_stats, schema)

# HINTS: Display input anomalies by using the calculated anomalies

I am getting a response of No Anomalies Found when the instructions state we are supposed to get some details about medical_specialty and glimepiride-pioglitazone features.

That will depend on which dataframe was used to generate the stats which, in turn, led to the generation of the schema. Here is a post with similar issue that might provide some clues to your situation.