C3M4 Graded Lab issue - exercise 1

I’m having an issue with exercise 1. The auto grader tells me df_compact is the wrong type when it is a Series, but changing it to a DataFrame causes a type error when tstat is passed to round(). I tried manually getting around this by wrapping test_results in float(), but when I submit it the auto grader then says “Failed test case: df_compact has an incorrect value. Make sure you are filtering the data correctly.” However, printing df_compact shows that it correctly filtered the 377 rows of Compact cars. I refreshed the notebook and I’m still getting the same issue. Every other exercise is 10/10 correct.

I do not understand what I’m doing wrong (or, rather, the exact answer the auto grader is looking for). Please let me know how to fix this.

{mentor edit: image of code removed}

Please do not post your code on the forum. That’s not allowed by the Code of Conduct.

I’ll edit your post to remove the code.

Hopefully a mentor for this course will be able to assist you.

Hi @ekekoa!
Just to be sure, are you referring to
Exercise 1: Hypothesis Testing ?

If so, that is probably happening is that the autograder is looking for a dataframe. If you apply a filter to a dataframe based on a specific column, you get a Series object.

So, as an example:

df_colour_apples = df[df[“fruit_type”]==“apple”][“colour”]

will give you a Series object.

If you apply your filter in the t-test, you will keep df_colour_apples as a df.

stats.ttest_ind(df[WHAT YOU WANT TO TEST HERE] (…)

I just tested myself following this reasoning above and I get the 10 points for this question.

Hi Gabi,
Thank you for your quick response! Yes, this is for Exercise 1: Hypothesis Testing in the Analyzing Car CO2 Emissions graded lab in module 4.

In your first block quote, that is exactly how I coded it originally. I filtered the dataframe and got a Series object saved to the variable df_compact.

I’m not understanding the next part though:

“If you apply your filter in the t-test, you will keep df_colour_apples as a df.”

stats.ttest_ind(df[WHAT YOU WANT TO TEST HERE] (…)

Can you elaborate on how to keep it as a df? Because here is what I’ve tried and it’s still not working (using your example):

stats.ttest_ind(df[df_colour_apples], …) – results in KeyError.

stats.ttest_ind(df[df[“fruit_type”]==“apple”][“colour”], …) – results in autograder 0/10 Failed test case: df_colour_apples has incorrect type.

I would like to point out that this is how the two sample t-test was taught in the lecture video (regarding diamond cut and prices), so this is how I originally coded it in the lab:

Obviously, just passing the variables to ttest_ind() doesn’t work because they are both Series objects after being filtered, as we have both noted. However, when I do it this way in the lab, I get the results with no issues. Meanwhile, trying to change df_compact and df_mid_size to dataframes causes the following error in the next cell:

(I hope it’s okay that I post a screencap since it is only default code already in the lab.)

Please let me know if you can see where I’m going wrong because I’m completely at a loss. Thank you!

Hi @ekekoa!

You don’t need to convert anything to dataframe/series object.
Maybe my example overcomplicated things, so let’s go by parts:

  1. First, the approach you’re following you’d work just fine “in the real world”. But since we’re in the course, the autograder follows some checks, and this one is probably not a good check. Thank you for sharing the printscreen, I will create a ticket to report this to the DeepLearning team :slight_smile:

  2. For now, to be able to score the points:

  • If you do something like this: goods = df[df[“cut”] == “Good”] , goods will be a dataframe.
  • But if you do something like goods = df[df[“cut”] == “Good”][“price”], you will get an object Series, because you’re slicing the original dataframe (selecting only once column of if.
  • For the autograder, this is exactly where the issue lies. The autograder expects df_compact to be a dataframe, not a series object. (again, this would not give you an issue in the real world, but with the autograder, it does).
  • So to comply with the autograder, and keep df_compact (and the other df) as a dataframe, you can apply the slicing inside the stats.ttest_ind function. This way you don’t change df_compact (and the other df), and both stay originally as a df.
1 Like

Check:

A last comment: if you copy my code from the example, you will get all sorts of errors, because I was trying to give you an example, without using the real DFs of the assignment, so you could apply the concepts of what I was trying to explain. I apologise if it made it even more unclear :face_with_peeking_eye: But I believe the above will help you.
I can imagine that it’s also confusing if you’re a beginner, since the example of the lesson will be slightly different here.
But you’re on the right path.

So:

  • keep df_compact and df_medium_size as dataframes (don’t slice them!)
  • apply the slicing (selecting the column you need to check for statistical relevance) inside the stats.ttest_ind function.

I really hope it’s clear now :slight_smile:

Oh my gosh, THANK YOU! I was able to receive full credit from the autograder after I moved the slice ([“CO2 emissions”]) behind df_compact inside of ttest_ind() instead of when initializing df_compact.

What a relief! I sincerely appreciate your explanations. They were perfectly clear, I just used your example variable in my message because I didn’t want to accidentally break the code of conduct again.

Thank you!!

1 Like