Do you want to use machine learning in production? Good luck explaining predictions to non-technical folks. LIME and SHAP can help.
Explainable machine learning is a term any modern-day data scientist should know. Today you’ll see how the two most popular options compare – LIME and SHAP.
If acronyms LIME and SHAP sound like a foreign language, please refer to the articles below:
These two cover the basic theory and practical implementation. It’s a good read if you’re new to the topic.
This article covers the following:
- Training a machine learning model
- Prediction explanation with LIME
- Prediction explanation with SHAP
- The verdict
Training a machine learning model
Let’s keep this part simple and train a model on one of the simplest datasets available. The wine quality dataset is a perfect candidate, as it requires no preparation. Here’s how to load it with Pandas:
import numpy as np import pandas as pd wine = pd.read_csv('wine.csv') wine.head()
The dataset is as clean as they come, so you can immediately proceed with the train/test split. Here’s the code:
from sklearn.model_selection import train_test_split X = wine.drop('quality', axis=1) y = wine['quality'] X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
The final step is to train the model. Any tree-based model will work great for explanations:
from xgboost import XGBClassifier model = XGBClassifier() model.fit(X_train, y_train) test_1 = X_test.iloc
The final line of code separates a single instance from the test set. You’ll use it to make explanations with both LIME and SHAP.
Prediction explanation with LIME
If you’ve read the article on LIME, you already know how to make explanations. If not, you should still be able to follow along. You’ll use
LimeTabularExplainer to classify a single instance as good or bad.
LIME also needs a function for making predictions. Since this is a classification problem, let’s use probabilities (
predict_proba). Here’s the code:
import lime from lime import lime_tabular lime_explainer = lime_tabular.LimeTabularExplainer( training_data=np.array(X_train), feature_names=X_train.columns, class_names=['bad', 'good'], mode='classification' ) lime_exp = lime_explainer.explain_instance( data_row=test_1, predict_fn=model.predict_proba ) lime_exp.show_in_notebook(show_table=True)
If you’re following along, you should see this in the notebook:
The model is almost certain this is a bad wine (96% chance). Take a look at the first three features – they are all increasing the chance of a wine being classified as bad.
That’s LIME in a nutshell. Only one problem with it – the visualizations look horrible.
They’re still interpretable, but I wouldn’t show this visualization to my boss. It’s not necessarily a problem – LIME lets you access prediction probabilities. Here’s how:
Here’s the corresponding output:
LIME also lets you access the probabilities for the middle chart:
I’m sure you can use this data to make better-looking visualizations.
To conclude – LIME tells you everything you need but doesn’t produce the best-looking visuals. Still, tweaking them shouldn’t be a problem.
Prediction explanation with SHAP
SHAP is a bit different. It bases the explanations on shapely values – measures of contributions each feature has in the model.
The idea is still the same – get insights into how the machine learning model works.
Below you’ll find code for importing the libraries, creating instances, calculating SHAP values, and visualizing the interpretation of a single prediction. For convenience sake, you’ll interpret the prediction for the same data point as with LIME:
import shap shap.initjs() shap_explainer = shap.TreeExplainer(model) shap_values = shap_explainer.shap_values(X) shap.force_plot(shap_explainer.expected_value, shap_values[1, :], test_1)
Here’s the corresponding visualization:
The visualization looks great, sure, but isn’t as interpretable as the one made by LIME. The red marks push the prediction higher towards the base value, and the blue marks do just the opposite.
For the wine dataset – if you see more blues than reds, the wine is classified as bad and vice versa.
The story doesn’t end here. SHAP comes with summary plots – a neat way to visualize every feature’s importance and their impact on prediction. Let’s make one next:
- High alcohol value increases the predicted wine quality
- Low volatile acidity increases the predicted wine quality
To conclude – SHAP doesn’t look as intuitive as LIME out of the box. It comes with summary charts that make understanding an entire machine learning model easy.
So, which one should you use for machine learning projects?
Why not both? I use LIME to get a better grasp of a single prediction. On the other hand, I use SHAP mostly for summary plots and dependence plots.
Maybe using both will help you to squeeze out some additional information. But in general:
- Use LIME for single prediction explanation
- Use SHAP for entire model (or single variable) explanation
Which one do you prefer? For which use cases? Let me know in the comment section below.
The post LIME vs. SHAP: Which is Better for Explaining Machine Learning Models? appeared first on Better Data Science.