I am not sure this section of the week explains missing data appropriately in terms of medical analysis

Deepti_Prasad · October 21, 2023, 7:46am

Hello to the course coordinators,

Although the aim of explaining identifying missing data hypothetically is addressed but the main concept of explaining three most important concept about missing data i.e.

Missing Completely At Random (MCAR)
Missing At Random(MAR)
Missing Not At Random(MAR)

is not explained statistically appropriate.

The instructor mentions about flipping coins for Missing Completely at Random which is not the exact reasoning or understanding for missing data in medical statistical analysis.

Missing completely at random (MCAR). When data are MCAR, the fact that the data are missing is independent of the observed and unobserved data. So, basically there is no systematic differences exist between participants with missing data and those with complete data. (So as per week 2 examples, BP was not recording because the researcher you wrote the findings didn’t write it correctly, or was confused between systolic and diastolic pressure)
For example, some participants may have missing laboratory values because a batch of lab samples was processed improperly. In these instances, the missing data reduce the analyzable population of the study and consequently, the statistical power, but do not introduce bias: when data are MCAR, the data which remain can be considered a simple random sample of the full data set of interest. MCAR is generally regarded as a strong and often unrealistic assumption.
Missing at random (MAR). When data are MAR, the fact that the data are missing is systematically related to the observed but not the unobserved data(So basically if related to this course, instructor mentions any patient age less than 40, no BP was recorded).
For example, a registry examining depression may encounter data that are MAR if male participants are less likely to complete a survey about depression severity than female participants. That is, if probability of completion of the survey is related to their sex (which is fully observed) but not the severity of their depression, then the data may be regarded as MAR. Complete case analyses, which are based on only observations for which all relevant data are present and no fields are missing, of a data set containing MAR data may or may not result in bias. If the complete case analysis is biased, however, proper accounting for the known factors (in the above example, sex) can produce unbiased results in analysis.
Missing not at random (MNAR). When data are MNAR, the fact that the data are missing is systematically related to the unobserved data, that is, the missingness is related to events or factors which are not measured by the researcher. (So if you ask, related to the course week 2 reasoning where instructor mentions the doctor would have flip coin if age of the patient is less than 40 and then based on head or tails take the BP is not the correct reasoning to explain this concept, rather the doctor/researcher, was unable to take the BP because of the patient refused is unknown, or if the data was not recorded due to unknown reason other than the observed finding (like age in this example).
To extend the previous example, the depression registry may encounter data that are MNAR if participants with severe depression are more likely to refuse to complete the survey about depression severity. As with MAR data, complete case analysis of a data set containing MNAR data may or may not result in bias; if the complete case analysis is biased, however, the fact that the sources of missing data are themselves unmeasured means that (in general) this issue cannot be addressed in analysis and the estimate of effect will likely be biased.

The reason why I was looking an explanation like this was because it was related to Medicine statistical analysis. For a random person, this explanation would confuse about what to do if any of the medical lab values are missing in a data, then if he sees this videos he would apply same thought of flipping coin rather than understanding the difference between observed and unobserved data, thereby confused to understand identifying the correct type of missing data.

Regards
DP

getjaidev · November 25, 2023, 4:22pm

Deepti, Hello.

Glad you enjoyed the course. I see you have completed it successfully. Congratulations!

Thanks for the detailed explanation. It makes a good reference for other students. Sometimes it is kind if difficult to compress a complex topic into a few slides/minutes but we do encourage students to research out detail and discuss it here for everyone’s benefit.

Best Regards.

– Jaidev

Topic		Replies	Views
Suggestion on naming conventions AI for Medical Prognosis week-module-2 , week-module-3	1	525	March 8, 2023
Doubt in Interpretation of code given in section 5. Imputation for C2_W2_ assignment AI for Medical Prognosis week-module-2	1	229	March 25, 2024
Data Problem AI for Medical Diagnosis	1	303	December 25, 2023
AI for Medicine: AI for Prognosis: Week 2 Assignment AI for Medical Prognosis week-module-2	3	88	July 29, 2024
General methodology for handling missing data in training examples Supervised ML: Regression and Classification	2	273	July 12, 2022

I am not sure this section of the week explains missing data appropriately in terms of medical analysis

Related topics