Need clarification: Rideshare_Project_Week4 (notebook)

Direct link to notebook:

In the code snippet below, the goal is to estimate a confidence interval specifically on the holiday days of the dataset:

# Select the data only for holidays
daily_rides_holidays = daily_rides[daily_rides["date"] > "2022-12-17"]

# Compute sample mean and standard deviation for holidays
mean_rides_per_day_holidays = daily_rides_holidays['daily_rides'].mean()
std_rides_per_day_holidays = daily_rides_holidays['daily_rides'].std()

print(f'Mean number of rides per day: {mean_rides_per_day_holidays:.2f} +/- {std_rides_per_day_holidays:.2f}')

# Get the confidence interval for the population mean for the holidays.
critical_value_holidays =  scipy.stats.t.ppf(1 - (1 - confidence)/2, df=len(daily_rides_holidays)-1)
total_days_holidays = daily_rides_holidays['date'].count()
confidence_interval_holidays = critical_value * std_rides_per_day_holidays / np.sqrt(total_days_holidays)
print(f"With a {100 * confidence}% confidence you can say that your error will be no more than {confidence_interval_holidays} rides per day.")

I noticed that the formula for confidence_interval_holidays is using the previous code cell’s critical_value shown below:

critical_value = scipy.stats.t.ppf(1 - (1 - confidence)/2, df=len(daily_rides)-1)

While the formula is correct, it seems like it’s using the whole year’s data from the previous calculations instead of the holiday specific ones.

In my head, calculating the confidence interval for holidays should be:

confidence_interval_holidays = critical_value_holidays * std_rides_per_day_holidays / np.sqrt(total_days_holidays)

Where you use critical_value_holidays instead of critical_value.

Is my thinking correct or did I miss something? Thanks in advance for the clarifications!

1 Like

@lucas.coutinho, can you look into this?

1 Like

Hi @markredito,

Thanks for bringing me this. I’m looking at it right now and will get back here soon.

Thanks for tagging me @TMosh.

1 Like

Hi @markredito,

You are correct, the variable critical_value_holidays is even computed, but not used. It was a mistake. I have already fixed it and it is live.

Thanks!
Lucas

1 Like