Course2 week1 exercise 7 anomalies

aimlgyani · August 22, 2021, 4:21pm

Exercise 7 C2W1 (Machine Learning in Data Lifecycle in production)

Response is no anomalies found

serving_stats = tfdv.generate_statistics_from_dataframe(serving_df, options)
calculate_and_display_anomalies(serving_stats, schema=schema)

aimlgyani · August 22, 2021, 4:30pm

I was able to fix it. Thanks

aimlgyani · August 22, 2021, 4:36pm

But it is showing below as well which is not mentioned in expected output. Is that ok?

‘readmitted’ Column dropped Column is completely missing

aimlgyani · August 23, 2021, 7:38pm

Later it was mentioned about ‘readmitted’ as well. I was able to finish and pass. Thanks

chris.favila · August 24, 2021, 2:26am

Hi Aiml! Glad you were able to resolve the issue. For next time, please choose the course specific category so the mentors can see your topic sooner:

Originally this was posted in the parent Machine Learning Engineering for Production category. There should be subcategories there to choose the specific course and mentors are monitoring those. Thank you!

aimlgyani · August 24, 2021, 2:30am

Thank you. Much appreciated.

hdot · August 29, 2021, 12:42pm

@aimlgyani @chris.favila I have the same problem. It shows “no anomalies found”.

I’m happy that you were able to fix it - can you please explain what you did to resolve this problem?

Thanks! in advance!

hdot · August 29, 2021, 12:50pm

Here’s how I solved it!

See the function that we filled in below:

def calculate_and_display_anomalies(statistics, schema):
    '''
    Calculate and display anomalies.

            Parameters:
                    statistics : Data statistics in statistics_pb2.DatasetFeatureStatisticsList format
                    schema : Data schema in schema_pb2.Schema format

            Returns:
                    display of calculated anomalies
    '''
    ### START CODE HERE
    # HINTS: Pass the statistics and schema parameters into the validation function 
    # Check evaluation data for errors by validating the evaluation dataset statistics using the reference schema
    anomalies =  tfdv.validate_statistics(statistics=eval_df, schema=schema)


    # HINTS: Display input anomalies by using the calculated anomalies
    # Visualize anomalies
    tfdv.display_anomalies(anomalies)

Look at the line that defines the anomalies variable. Can you see a problem?

It is processing the evaluation set instead of the data being passed to the function.

I resolved this by changing the anomalies variable to the following:

    anomalies =  tfdv.validate_statistics(statistics=statistics, schema=schema)

Trevor_Chi-Yuen_Tao · November 17, 2022, 3:03am

Thanks, I had the same issue as well. Somehow I was still able to go through the rest of the notebook and get 100%

I assume you meant to say the error was statistics = eval_stats instead of statistics = eval_df.

Also you should have highlighted statistics=statistics instead of schema=schema. Correct?

Topic		Replies	Views
C2W1_Assignment Exercise 5: Detecting Anomalies. No anomalies found. !? Machine Learning Data Lifecycle in Production	10	642	May 23, 2022
C2W1_Assignment Exercise_7 Machine Learning Data Lifecycle in Production week-1	6	493	December 24, 2023
Data validation assignment Machine Learning Data Lifecycle in Production	1	510	February 1, 2023
Question about Course 2 Week 1 Exercise 4 Machine Learning Engineering for Production(MLOps)	3	110	June 27, 2021
C2W1 Assignment: No anomaly found for serving df Machine Learning Data Lifecycle in Production	5	644	June 19, 2021

Course2 week1 exercise 7 anomalies

Related topics