Need clarification: Rideshare_Project_Week4 (notebook)

markredito · June 1, 2024, 8:28pm

Direct link to notebook:

In the code snippet below, the goal is to estimate a confidence interval specifically on the holiday days of the dataset:

# Select the data only for holidays
daily_rides_holidays = daily_rides[daily_rides["date"] > "2022-12-17"]

# Compute sample mean and standard deviation for holidays
mean_rides_per_day_holidays = daily_rides_holidays['daily_rides'].mean()
std_rides_per_day_holidays = daily_rides_holidays['daily_rides'].std()

print(f'Mean number of rides per day: {mean_rides_per_day_holidays:.2f} +/- {std_rides_per_day_holidays:.2f}')

# Get the confidence interval for the population mean for the holidays.
critical_value_holidays =  scipy.stats.t.ppf(1 - (1 - confidence)/2, df=len(daily_rides_holidays)-1)
total_days_holidays = daily_rides_holidays['date'].count()
confidence_interval_holidays = critical_value * std_rides_per_day_holidays / np.sqrt(total_days_holidays)
print(f"With a {100 * confidence}% confidence you can say that your error will be no more than {confidence_interval_holidays} rides per day.")

I noticed that the formula for confidence_interval_holidays is using the previous code cell’s critical_value shown below:

critical_value = scipy.stats.t.ppf(1 - (1 - confidence)/2, df=len(daily_rides)-1)

While the formula is correct, it seems like it’s using the whole year’s data from the previous calculations instead of the holiday specific ones.

In my head, calculating the confidence interval for holidays should be:

confidence_interval_holidays = critical_value_holidays * std_rides_per_day_holidays / np.sqrt(total_days_holidays)

Where you use critical_value_holidays instead of critical_value.

Is my thinking correct or did I miss something? Thanks in advance for the clarifications!

TMosh · June 5, 2024, 6:59pm

@lucas.coutinho, can you look into this?

lucas.coutinho · June 6, 2024, 1:55pm

Hi @markredito,

Thanks for bringing me this. I’m looking at it right now and will get back here soon.

Thanks for tagging me @TMosh.

lucas.coutinho · June 6, 2024, 3:05pm

Hi @markredito,

You are correct, the variable critical_value_holidays is even computed, but not used. It was a mistake. I have already fixed it and it is live.

Thanks!
Lucas

Topic		Replies	Views
Need confirmation: Rideshare_Project_Week4 Probability & Statistics for Machine Learning &... week-4	0	17	February 24, 2025
C3-W4 Assignment: Exercise 9: confidence_interval_proportion Probability & Statistics for Machine Learning &... week-4	17	612	November 20, 2023
Week 2 Assignment Section 4 Confidence Interval 95% Clarification Please AI for Medical Diagnosis week-2	2	311	December 14, 2023
Error in Week 4 Summative Quiz Probability & Statistics for Machine Learning &... general	3	29	July 26, 2024
Sigma is population std. deviation or std. deviation of sampling distribution? Probability & Statistics for Machine Learning &... week-4	3	330	October 9, 2023

Need clarification: Rideshare_Project_Week4 (notebook)

Related topics