LIME vs. SHAP: Which is Better for Explaining Machine Learning Models?

[This article was first published on python – Better Data Science, and kindly contributed to python-bloggers]. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Do you want to use machine learning in production? Good luck explaining predictions to non-technical folks. LIME and SHAP can help.

Explainable machine learning is a term any modern-day data scientist should know. Today you’ll see how the two most popular options compare – LIME and SHAP.

If acronyms LIME and SHAP sound like a foreign language, please refer to the articles below:

SHAP: How to Interpret Machine Learning Models With Python

LIME: How to Interpret Machine Learning Models With Python

These two cover the basic theory and practical implementation. It’s a good read if you’re new to the topic.

This article covers the following:

Training a machine learning model

Let’s keep this part simple and train a model on one of the simplest datasets available. The wine quality dataset is a perfect candidate, as it requires no preparation. Here’s how to load it with Pandas:

import numpy as np
import pandas as pd

wine = pd.read_csv('wine.csv')
wine.head()
Head of wine quality dataset

Image 1 – Head of wine quality dataset (image by author)

The dataset is as clean as they come, so you can immediately proceed with the train/test split. Here’s the code:

from sklearn.model_selection import train_test_split

X = wine.drop('quality', axis=1)
y = wine['quality']

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

The final step is to train the model. Any tree-based model will work great for explanations:

from xgboost import XGBClassifier

model = XGBClassifier()
model.fit(X_train, y_train)

test_1 = X_test.iloc[1]

The final line of code separates a single instance from the test set. You’ll use it to make explanations with both LIME and SHAP.

Prediction explanation with LIME

If you’ve read the article on LIME, you already know how to make explanations. If not, you should still be able to follow along. You’ll use LimeTabularExplainer to classify a single instance as good or bad

LIME also needs a function for making predictions. Since this is a classification problem, let’s use probabilities (predict_proba). Here’s the code:

import lime 
from lime import lime_tabular

lime_explainer = lime_tabular.LimeTabularExplainer(
    training_data=np.array(X_train),
    feature_names=X_train.columns,
    class_names=['bad', 'good'],
    mode='classification'
)


lime_exp = lime_explainer.explain_instance(
    data_row=test_1,
    predict_fn=model.predict_proba
)
lime_exp.show_in_notebook(show_table=True)

If you’re following along, you should see this in the notebook:

LIME interpretations

Image 2 – LIME interpretations (image by author)

The model is almost certain this is a bad wine (96% chance). Take a look at the first three features – they are all increasing the chance of a wine being classified as bad.

That’s LIME in a nutshell. Only one problem with it – the visualizations look horrible.

They’re still interpretable, but I wouldn’t show this visualization to my boss. It’s not necessarily a problem – LIME lets you access prediction probabilities. Here’s how:

lime_exp.predict_proba

Here’s the corresponding output:

LIME probabilities

Image 3 – LIME probabilities (image by author)

LIME also lets you access the probabilities for the middle chart:

lime_exp.as_list()
List of LIME explanations

Image 4 – List of LIME explanations (image by author)

I’m sure you can use this data to make better-looking visualizations.

To conclude – LIME tells you everything you need but doesn’t produce the best-looking visuals. Still, tweaking them shouldn’t be a problem.

Prediction explanation with SHAP

SHAP is a bit different. It bases the explanations on shapely values – measures of contributions each feature has in the model. 

The idea is still the same – get insights into how the machine learning model works.

Below you’ll find code for importing the libraries, creating instances, calculating SHAP values, and visualizing the interpretation of a single prediction. For convenience sake, you’ll interpret the prediction for the same data point as with LIME:

import shap
shap.initjs()

shap_explainer = shap.TreeExplainer(model)
shap_values = shap_explainer.shap_values(X)

shap.force_plot(shap_explainer.expected_value, shap_values[1, :], test_1)

Here’s the corresponding visualization:

SHAP explanations

Image 5 – SHAP explanations (image by author)

The visualization looks great, sure, but isn’t as interpretable as the one made by LIME. The red marks push the prediction higher towards the base value, and the blue marks do just the opposite. 

For the wine dataset – if you see more blues than reds, the wine is classified as bad and vice versa.

The story doesn’t end here. SHAP comes with summary plots – a neat way to visualize every feature’s importance and their impact on prediction. Let’s make one next:

shap.summary_plot(shap_values, X)
SHAP summary chart

Image 6 – SHAP summary chart (image by author)

To interpret:

  • High alcohol value increases the predicted wine quality
  • Low volatile acidity increases the predicted wine quality

To conclude – SHAP doesn’t look as intuitive as LIME out of the box. It comes with summary charts that make understanding an entire machine learning model easy.

The verdict

So, which one should you use for machine learning projects? 

Why not both? I use LIME to get a better grasp of a single prediction. On the other hand, I use SHAP mostly for summary plots and dependence plots

Maybe using both will help you to squeeze out some additional information. But in general:

  • Use LIME for single prediction explanation
  • Use SHAP for entire model (or single variable) explanation

Which one do you prefer? For which use cases? Let me know in the comment section below.

Join my private email list for more helpful insights.

The post LIME vs. SHAP: Which is Better for Explaining Machine Learning Models? appeared first on Better Data Science.

To leave a comment for the author, please follow the link and comment on their blog: python – Better Data Science.

Want to share your content on python-bloggers? click here.