Imbalanced data error

Hello. I’m having a problem with this week’s assignment.

The problem states that the plot from wine quality dataset is imbalanced.

Since there are very few observations with quality equal to 3, 4, 8 and 9, we should drop these observations.

I did just that by filtering out all classes except those that are > 4 and < 8.

But I get an error when using the utils.test_df_drop(df) function.

Am I missing something?

This is my code:

df = df[(df[‘quality’] > 4) & (df[‘quality’] < 8 )]
df = df.reset_index(drop=True)

That part of the code looks right but there might be other tests performed and I advise you to recheck the cells above this one. You must be doing something wrong above.

I had the same issue, in case someone else stumbles upon this, make sure that the cell containing df = df.iloc[np.random.permutation(len(df))] is commented!