Suppose I have a deep learning program that predicts whether someone has a disease or not. In one sample, 95 people have the disease, while 5 people do not. However, regardless of the input conditions, my program always predicts that the person has the disease. As a result, the precision of my program is 0.95 and the recall rate is 1, indicating that both precision and recall are high. However, in reality, my program only outputs 1. In this case, how can we determine the effectiveness of my learning algorithm? Can you help me?
This is covered in Course 3 Week 1. The topic is anomaly detection, and the method is to compute the F1 score.
Thank you for your response. I appreciate it. I apologize for asking the question in the wrong place.
Even though the F1 score is high, as calculated by 2 * (0.95 * 1) / (0.95 + 1) = 0.974, the fact remains that my program only outputs 1.
I am just starting to learn this course, so if I have any misunderstandings, please let me know. Thank you.
The standard method is that the rare cases are marked True. In your data set, that is backwards.
As Tom has explained, we usually assign the dominant class as Negative (0), and the rare class as Positive (1), before precision or recall can be a better measure for model based on imbalanced labels. The reason is clear if you look at the formula:
That when the dominant class is Negative, correctly predicting the dominant class will only yield a high tn. However, since tn never shows up in any of the above two formula, it contributes nothing to the final result. In contrast, your model never predicts any tp, it will therefore result in zero precision and zero recall.
Of course, the metric does not make your model better, it is only a way to reflect the performance of your model, better than metric that takes tn into account such as the accuracy.
Cheers,
Raymond
Thank you for your reply. I got it.