Bayes Theorem - Intuition

ai_developer · January 5, 2025, 7:44pm

In the video “Bayes Theorem - Intuition” in “Probability and Statistics for Machine Learning” (the third course in the specialization) – week 1, there appears to be an inconsistency in the count data presented at 1:32 and 2:08 in the video. I am not sure whether this was intended, but I explored the differences and decided to post my observations here as they might be helpful to others. The count data presented in the two instances suggest there are very different probabilities for test precision (a.k.a. positive predictive value (PPV), likelihood, or P(disease|tested+)).

In the first slide (at 1:32), the confusion matrix was structured as follows, with count data for sick and healthy people tested for an illness:

Confusion matrix:
TP FN
FP TN

TP: 99 people
FN: 1 person
FP: 1 person
TN: 99 people

The instructor asked: What is the probability you are sick given that you tested sick? The data result in P(sick | tested sick) = TP / (TP + FP) = 99 / (99 + 1) = 0.99. P(not sick| tested sick) = 0.01. So in this case, a person should be quite concerned about a positive test result.

The inline course poll that appears immediately after this slide reflects that most learners (around 67%) currently say that based on the test data presented at around 1:32 in the video, they think it is more likely than not that a positive test result indicates actual sickness. The positive predictive value of the test appears to justify this conclusion, if I have done the calculations correctly.

At 2:08 in the video, the confusion matrix is structured differently, and contains very different count data (i.e., from a much larger population):

Confusion matrix:
FP TN
TP FN

FP: 9,999 people
TN: 989,901 people
TP: 99 people
FN: 1 person

This yields P(sick | tested sick) = TP / (TP + FP) = 99 / (99+9999) = 99 / 10098 = 0.0098. (As discussed in the video.) P(not sick| tested sick) = 0.9902. This is a very different picture than the one suggested by the earlier data. As the instructor mentioned, the later data suggest a person should not be too concerned about a positive test result (because it is likely the person has been misdiagnosed).

Is my understanding of the data correct?

I hope this is helpful. Comments/discussion encouraged.

Deepti_Prasad · January 5, 2025, 8:49pm

If you notice in the first slide probability of people being sick and tested sick is higher than probability of people not being sick tested is minimal at 0.01. So the people should be concerned right?

people who were tested and where found sick js 0.0098 which is much less than people who were tested who were not sick is much higher and the reason here, instructor is telling about not to worry because number of fp results were higher stating many people were misdiagnosed.

I hope this was the confusion!!!

Topic		Replies	Views
Bayes Formula derivation inconsistency with conditional probability formula Probability & Statistics for Machine Learning &... week-1	2	38	December 9, 2024
Question on the Modelling Challenge quiz Machine Learning in Production	5	624	June 27, 2021
C3_W1 Naive Bayes Algorithm Question Probability & Statistics for Machine Learning &... week-1	4	490	July 19, 2023
I am so confused about P-Value Probability & Statistics for Machine Learning &... week-4	5	70	November 9, 2024
Possible error in course 1, week 2 quiz MLOps Machine Learning in Production	2	691	July 9, 2021

Bayes Theorem - Intuition

Related topics