Exercise 7 C2W1 (Machine Learning in Data Lifecycle in production)
Response is no anomalies found
serving_stats = tfdv.generate_statistics_from_dataframe(serving_df, options)
calculate_and_display_anomalies(serving_stats, schema=schema)
Exercise 7 C2W1 (Machine Learning in Data Lifecycle in production)
Response is no anomalies found
serving_stats = tfdv.generate_statistics_from_dataframe(serving_df, options)
calculate_and_display_anomalies(serving_stats, schema=schema)
I was able to fix it. Thanks
But it is showing below as well which is not mentioned in expected output. Is that ok?
āreadmittedā Column dropped Column is completely missing
Later it was mentioned about āreadmittedā as well. I was able to finish and pass. Thanks
Hi Aiml! Glad you were able to resolve the issue. For next time, please choose the course specific category so the mentors can see your topic sooner:
Originally this was posted in the parent Machine Learning Engineering for Production
category. There should be subcategories there to choose the specific course and mentors are monitoring those. Thank you!
Thank you. Much appreciated.
@aimlgyani @chris.favila I have the same problem. It shows āno anomalies foundā.
Iām happy that you were able to fix it - can you please explain what you did to resolve this problem?
Thanks! in advance!
Hereās how I solved it!
See the function that we filled in below:
def calculate_and_display_anomalies(statistics, schema):
'''
Calculate and display anomalies.
Parameters:
statistics : Data statistics in statistics_pb2.DatasetFeatureStatisticsList format
schema : Data schema in schema_pb2.Schema format
Returns:
display of calculated anomalies
'''
### START CODE HERE
# HINTS: Pass the statistics and schema parameters into the validation function
# Check evaluation data for errors by validating the evaluation dataset statistics using the reference schema
anomalies = tfdv.validate_statistics(statistics=eval_df, schema=schema)
# HINTS: Display input anomalies by using the calculated anomalies
# Visualize anomalies
tfdv.display_anomalies(anomalies)
Look at the line that defines the anomalies
variable. Can you see a problem?
It is processing the evaluation set instead of the data being passed to the function.
I resolved this by changing the anomalies variable to the following:
anomalies = tfdv.validate_statistics(statistics=statistics, schema=schema)
Thanks, I had the same issue as well. Somehow I was still able to go through the rest of the notebook and get 100%
I assume you meant to say the error was statistics = eval_stats instead of statistics = eval_df.
Also you should have highlighted statistics=statistics instead of schema=schema. Correct?