Important considerations are not included in any of these discussions. The procedures discussed above invite inappropriate thresholding and utilize improper accuracy scoring rules (proportions) that are optimized by choosing the wrong features and giving them the wrong weights.

Dichotomization of continuous predictions flies in the face of optimal decision theory. ROC curves provide no actionable insights. They have become obligatory without researchers examining the benefits. They have a very large ink:information ratio.

In this example of a confusion matrix, among the 50 data points that are classified, 45 are correctly classified and the 5 are misclassified.

By the way, in R I typically use ROCR package for drawing ROC curves and calculating AUC.

In this figure, the blue area corresponds to the Area Under the curve of the Receiver Operating Characteristic (AUROC). The dashed line in the diagonal we present the ROC curve of a random predictor: it has an AUROC of 0.5. The random predictor is commonly used as a baseline to see whether the model is useful.

See the Information Loss chapter in Biostatistics for Biomedical Research and other chapters for more information.


Although I'm a bit late to the party, but here's my 5 cents. @FranckDernoncourt (+1) already mentioned possible interpretations of AUC ROC, and my favorite one is the first on his list (I use different wording, but it's the same):


Optimum decisions don't consider "positives" and "negatives" but rather the estimated probability of the outcome. The utility/cost/loss function, which plays no role in ROC construction hence the uselessness of ROCs, is used to translate the risk estimate to the optimal (e.g., lowest expected loss) decision.

Searched high and low and have not been able to find out what AUC, as in related to prediction, stands for or means.

An example of its application are ROC curves. Here, the true positive rates are plotted against false positive rates. An example is below. The closer AUC for a model comes to 1, the better it is. So models with higher AUCs are preferred over those with lower AUCs.

AUC is an abbrevation for area under the curve. It is used in classification analysis in order to determine which of the used models predicts the classes best.

Let's try to simulate it: draw random positive and negative examples and then calculate the proportion of cases when positives have greater score than negatives

Note: We have 119 other definitions for AUC in our Acronym Attic

Please note, there are also other methods than ROC curves but they are also related to the true positive and false positive rates, e. g. precision-recall, F1-Score or Lorenz curves.

Since to compare two different models it is often more convenient to have a single metric rather than several ones, we compute two metrics from the confusion matrix, which we will later combine into one:

Before presenting the ROC curve (= Receiver Operating Characteristic curve), the concept of confusion matrix must be understood. When we make a binary prediction, there can be 4 types of outcomes:

To combine the FPR and the TPR into one single metric, we first compute the two former metrics with many different threshold (for example $0.00; 0.01, 0.02, \dots, 1.00$) for the logistic regression, then plot them on a single graph, with the FPR values on the abscissa and the TPR values on the ordinate. The resulting curve is called ROC curve, and the metric we consider is the AUC of this curve, which we call AUROC.

AUC is used most of the time to mean AUROC, which is a bad practice since as Marc Claesen pointed out AUC is ambiguous (could be any curve) while AUROC is not.

To get the confusion matrix, we go over all the predictions made by the model, and count how many times each of those 4 types of outcomes occur:

Assume we have a probabilistic, binary classifier such as logistic regression.