In the videos and in the reading portion titled “Calculating the PPV in terms of sensitivity, specificity, and prevalence”, Prevalence is defined as the count of actual positives there are in “the population”. The use of the phrase “the population” makes it sound like we should have data on the occurrences of the diseases in a population like a city, or the United States, for example, but in all of the calculation examples in the lessons, we’re calculating Prevalance using the total number of examples we have (rather than a “population”).
In machine learning does population just mean the group of examples?
1 Like
Of course you can only do calculations or training on the data that you actually have. But what you have to assume is that your data is statistically representative of the sample population from which the data was gathered. In a medical context, that can be very specific and should be noted with the data. E.g. people of both sexes between the ages of 18 and 65 who have not previously been vaccinated for PPV. I just made up that example and have not taken this course, so I don’t know what they do or don’t say on this kind of point in the course materials.
1 Like
Based on this video: https://www.coursera.org/learn/ai-for-medical-diagnosis/lecture/WzOsv/sampling-from-the-total-population , it looks like p represents the accuracy for the “whole population” (in the case of the example, all patients that have gotten chest x-rays), which would be unknown because it is unfeasible to review the data of this many patients, while p̂ is the population of our sample, which we can know.
Based on this video it seems like there are two different types of “population”. The full population, and our sample population.
Can anyone confirm?
1 Like
Hello @littlebird
I will try to explain the word population from an example point of view.
A prevalence rate is the total number of cases of a disease existing in a population divided by the total population. So, if a measurement of diabetes is taken in a population of 40,000 people and 1,200 were recently diagnosed with diabetes and 3,500 are living with diabetes, then the prevalence of diabetes is 0.118.
So when the population word used in the definition is basically the selective number of people choosen to be included in a study and not the world/city/country specific population.
Say a survey is being done on patients visiting a hospital for diabetes checkup, and number of patient visited were 10000, here 10000 is counted as population, and number of patient diagnosed with diabetes were 256, so prevalence become 256/10000=0.0256
To calculate prevalence, the number of people in the sample with the characteristic of interest, divided by the total number of people in the sample .
Here total number of people in the sample represents population.
feel free to ask any more doubt.
Regards
DP
1 Like