ROC and AUC – How to Evaluate Machine Learning Models in No Time

This article was first published on python – Better Data Science , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Model selection should be easy. And it is – if you know how to calculate and interpret ROC curves and AUC scores. That’s what you’ll learn in this article – in 10 minutes if you’re coding along. In 5 if you aren’t.

After reading, you’ll know:

ROC and AUC demistyfied

You can use ROC (Receiver Operating Characteristic) curves to evaluate different thresholds for classification machine learning problems. In a nutshell, ROC curve visualizes a confusion matrix for every threshold.

But what are thresholds?

Every time you train a classification model, you can access prediction probabilities. If a probability is greater than 0.5, the instance is classified as positive. Here, 0.5 is the decision threshold. You can adjust it to reduce the number of false positives or false negatives.

ROC curve shows a False positive rate on the X-axis. This metric informs you about the proportion of negative class classified as positive (Read: COVID negative classified as COVID positive).

On the Y-axis, it shows a True positive rate. This metric is sometimes called Recall or Sensitivity, so keep that in mind. It informs you about the positive class proportion that was correctly classified (Read: COVID positive and classified as COVID positive). 

Refer to the following image for a refresher in the confusion matrix and TPR/FPR calculation:

Confusion matrix and TPR/FPR calculation

Image 1 – Confusion matrix and TPR/FPR calculation (image by author)

Great, but what is AUC? 

AUC represents the area under the ROC curve. Higher the AUC, the better the model at correctly classifying instances. Ideally, the ROC curve should extend to the top left corner. The AUC score would be 1 in that scenario.

Let’s go over a couple of examples. Below you’ll see random data drawn from a normal distribution. Means and variances differ to represent centers for different classes (positive and negative).

For a great model, the distributions are entirely separated:

Great Model

Image 2 – A model with AUC = 1 (image by author)

You can see that this yields an AUC score of 1, indicating that the model classifies every instance correctly.

Can AUC be 0? Yes – it means the model is reciprocating the classes. In other words, it’s predicting positive classes and negative and vice versa. Take a look at the image below:

Reciprocal Model

Image 3 – A model with AUC = 0 (image by author)

Can you think of a quick way of turning a 0% accurate model into a 100% one? Let’ me know in the comment section below.

Finally, there’s a scenario when AUC is 0.5. It means the model is useless. Just think about it, you ask a model whether someone is positive or negative, and it tells you: well, maybe it’s positive, maybe it’s negative (50:50). That’s useless for binary classification tasks.

Here’s how the ROC curve looks like when AUC is 0.5:

Useless Model

Image 4 – A model with AUC = 0.5 (image by author)

Now you know the theory. Let’s connect it with practice next.

Using ROC and AUC in Python

You’ll use the White wine quality dataset for the practical part. Here’s how to load it with Python:

The first couple of rows look like this:

White wine dataset head

Image 5 – White wine dataset head (image by author)

Initially, this is not a binary classification dataset, but you can convert it to one. Let’s say the wine is Good if the quality is 7 or above, and Bad otherwise:


There’s your binary classification dataset. Let’s visualize the counts of good and bad wines next. Here’s the code:

And here’s the chart:

Class distribution

Image 6 – Class distribution of the target variable (image by author)

And there’s nothing more to do with regards to preparation. You can make a train/test split next:

Great! The snippet below shows you how to train logistic regression, decision tree, random forests, and extreme gradient boosting models. It also shows you how to grab probabilities for the positive class. It will come in handy later:

You can visualize the ROC curves and calculate the AUC now. The only requirement is to remap the Good and Bad class names to 1 and 0, respectively. 

The following code snippet visualizes the ROC curve for the four trained models and shows their AUC score on the legend:

Here’s the corresponding visualization:

ROC curves for models

Image 7 – ROC curves for different machine learning models (image by author)

No perfect models here, but all of them are far away from the baseline (unusable model). The random forest algorithm is the best, with a 0.93 AUC score. That’s amazing for the preparation and feature engineering we did. 


In a nutshell, you can use ROC curves and AUC scores to choose the best machine learning model for your dataset. Image 7 shows you how easy it is to interpret the ROC curves, even when there are multiple curves on the same chart.

If you need a completely automated solution, look only at the AUC and select the model with the highest score.

What’s your approach to model selection? Let me know in the comment section.

Join my private email list for more helpful insights.


The post ROC and AUC – How to Evaluate Machine Learning Models in No Time appeared first on Better Data Science.

To leave a comment for the author, please follow the link and comment on their blog: python – Better Data Science .

Want to share your content on python-bloggers? click here.