Definition of AUC

hi,

in one of the videos (here https://www.coursera.org/learn/machine-learning-modeling-pipelines-in-production/lecture/hBfzT/measuring-fairness at 2:58) lecturer defines the AUC as “the percentage of datapoints, which are correctly labeled when each class is given an equal weight independent of the number of samples”.

I used to consider AUC as a measure of how well predictions are ranked, and if I look at Wikipedia, AUC is defined as “the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming ‘positive’ ranks higher than ‘negative’).” , which I think is a bit different from the lecturers definition at least because AUC in wikipedia definition doesn’t require prediction threshold, which separates classes, to be defined.

Do you think that wikipedia definition and the one given by the lecturer align well/are consistent with each other?

Hi, @Volodymyr_Medentsiy !

They both talk about the same principle and both explanations are perfectly coherent.

AUC is defined as “the probability that a classifier will rank a randomly chosen positive instance higher than a randomly chosen negative one (assuming ‘positive’ ranks higher than ‘negative’).”

That’s exactly what AUC is. It’s concordant with

defines the AUC as “the percentage of datapoints, which are correctly labeled when each class is given an equal weight independent of the number of samples”.

  • because AUC * AUC is scale-invariant. It measures how well predictions are ranked, rather than their absolute values.
  • AUC is classification-threshold-invariant. It measures the quality of the model’s predictions irrespective of what classification threshold is chosen.