Course2 week1 exercise 7 anomalies

Exercise 7 C2W1 (Machine Learning in Data Lifecycle in production)

Response is no anomalies found

serving_stats = tfdv.generate_statistics_from_dataframe(serving_df, options)
calculate_and_display_anomalies(serving_stats, schema=schema)

I was able to fix it. Thanks

But it is showing below as well which is not mentioned in expected output. Is that ok?

‘readmitted’ Column dropped Column is completely missing

Later it was mentioned about ‘readmitted’ as well. I was able to finish and pass. Thanks

Hi Aiml! Glad you were able to resolve the issue. For next time, please choose the course specific category so the mentors can see your topic sooner:

Originally this was posted in the parent Machine Learning Engineering for Production category. There should be subcategories there to choose the specific course and mentors are monitoring those. Thank you!

1 Like

Thank you. Much appreciated.

@aimlgyani @chris.favila I have the same problem. It shows “no anomalies found”.

I’m happy that you were able to fix it - can you please explain what you did to resolve this problem?

Thanks! in advance! :slight_smile:

Here’s how I solved it!

See the function that we filled in below:

def calculate_and_display_anomalies(statistics, schema):
    '''
    Calculate and display anomalies.

            Parameters:
                    statistics : Data statistics in statistics_pb2.DatasetFeatureStatisticsList format
                    schema : Data schema in schema_pb2.Schema format

            Returns:
                    display of calculated anomalies
    '''
    ### START CODE HERE
    # HINTS: Pass the statistics and schema parameters into the validation function 
    # Check evaluation data for errors by validating the evaluation dataset statistics using the reference schema
    anomalies =  tfdv.validate_statistics(statistics=eval_df, schema=schema)


    # HINTS: Display input anomalies by using the calculated anomalies
    # Visualize anomalies
    tfdv.display_anomalies(anomalies)

Look at the line that defines the anomalies variable. Can you see a problem?

It is processing the evaluation set instead of the data being passed to the function.

I resolved this by changing the anomalies variable to the following:

    anomalies =  tfdv.validate_statistics(statistics=statistics, schema=schema)

Thanks, I had the same issue as well. Somehow I was still able to go through the rest of the notebook and get 100%

I assume you meant to say the error was statistics = eval_stats instead of statistics = eval_df.

Also you should have highlighted statistics=statistics instead of schema=schema. Correct?