ROC curve and AUC (area under the ROC curve)

When dealing with logistic regression, there is a concept that is used widely that tells how much the model is capable of distinguishing between classes, which is the ROC curve, so what is it?!

An ROC curve (receiver operating characteristic curve) is a graph showing the performance of a classification model at all classification thresholds. This curve plots two parameters:

  • True Positive Rate
  • False Positive Rate

True Positive Rate (TPR) is a synonym for recall and is therefore defined as follows:


False Positive Rate(FPR) is defined as follows:


An ROC curve plots TPR vs. FPR at different classification thresholds. Lowering the classification threshold classifies more items as positive, thus increasing both False Positives and True Positives. The following figure shows a typical ROC curve.

AUC: Area Under the ROC Curve

AUC stands for “Area under the ROC Curve.” That is, AUC measures the entire two-dimensional area underneath the entire ROC curve (think integral calculus) from (0,0) to (1,1).

AUC provides an aggregate measure of performance across all possible classification thresholds. One way of interpreting AUC is as the probability that the model ranks a random positive example more highly than a random negative example.

I think the following plot is very illustrative. A perfect model is when we have both TPR = 1 and FPR = 1 independent of the threshold value, so we do not see a “curve” for such perfect condition but only the purple point. In cases less than perfect, threshold value (trade-off) comes into play and we start to see some curves. A better curve is closer to the purple point and thus has higher area under curve (AUC) value, and the worse if farther away from the purple point and thus smaller AUC. The worst curve is the red dashed-line which represents what a random model can do.

Source: wikipedia


how is it different from CAP curve , are cap curve and roc curve are same?