Rare Disease Classification example


Here I am completely lost on few things:

  1. Why are they claiming that Print(“Y=0”) is unnecessary?
  2. On what basis do they claim that an algorithm that produces 1% error is superior to one that produces 0.5% error?

And how this entire argument is related to the skewed data set?

Hello @GAURAV_MANCHANDA,

What this video tells us is that, we cannot always reply on the Accuracy metric, especially when the dataset is skewed.

It should be understood as “a dumb model that ignores the inputs and always predicts 0 can achieve 99.5% accuracy, but since it never diagnoses a real patient, and we take its diagnosis for it, we will not take any action, but the disease is still there in that real patient, what will happen?”

Will you rely on a test that always tell you “you are fine”, even though its accuracy is 99.5?

NOT on the basis of accuracy. However, on the basis of precision or recall. If you also watch the rest of the video, you will see a more suitable metric called the recall for our rare disease example. A high recall is able to diagnoses more real patients at the cost of misdiagnosing some healthy persons (lower accuracy). This can save more people’s lives from that disease.

Raymond

1 Like