Introducing LightSHAP

This article was first published on Python – Michael's and Christian's Blog , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

LightSHAP is here – a new, lightweight SHAP implementation for tabular data. While heavily inspired from the famous shap package, it has no dependency on it. LightSHAP simplifies working with dataframes (pandas, polars) and categorical data.

Key Features

  • Tree Models: TreeSHAP wrappers for XGBoost, LightGBM, and CatBoost via explain_tree()
  • Model-Agnostic: Permutation SHAP and Kernel SHAP via explain_any()
  • Visualization: Flexible plots

Highlights of the agnostic explainer:

  1. Exact and sampling versions of permutation SHAP and Kernel SHAP
  2. Sampling versions iterate until convergence, and provide standard errors
  3. Parallel processing via joblib
  4. Supports multi-output models
  5. Supports case weights
  6. Accepts numpy, pandas, and polars input, and categorical features

Some methods of the explanation object:

  • plot.bar(): Feature importance bar plot
  • plot.beeswarm(): Summary beeswarm plot
  • plot.scatter(): Dependence plots
  • plot.waterfall(): Waterfall plot for individual explanations
  • importance(): Returns feature importance values
  • set_X(): Update explanation data, e.g., to replace a numpy array with a DataFrame
  • set_feature_names(): Set or update feature names
  • select_output(): Select a specific output for multi-output models
  • filter(): Subset explanations by condition or indices

Usage

Let’s demonstrate the two workhorses explain_tree() and explain_any() with small examples.

Prepare diamonds data

import catboost
import numpy as np
import seaborn as sns
import statsmodels.formula.api as smf

# pip install lightshap
from lightshap import explain_any, explain_tree

# Prepare data
df0 = sns.load_dataset("diamonds")

df = df0.assign(
    log_carat=lambda x: np.log(x.carat),
    log_price=lambda x: np.log(x.price),
)

# Features only
X = df[["log_carat", "clarity", "color", "cut"]]

Fit and explain boosted trees model

Let’s (naively) build a small CatBoost model and explain ot using a sample of 1000 observations.

# Fit naively without validation strategy for simplicity
gbt = catboost.CatBoostRegressor(
    iterations=100, depth=4, cat_features=["clarity", "color", "cut"], verbose=0
)
_ = gbt.fit(X, y=df.log_price)

# SHAP analysis
X_explain = X.sample(1000, random_state=0)
gbt_explanation = explain_tree(gbt, X_explain)

gbt_explanation.plot.bar()
gbt_explanation.plot.beeswarm()
gbt_explanation.plot.scatter(sharey=False)
gbt_explanation.plot.waterfall(row_id=0)
Figure 1: SHAP importance bar plot for the CatBoost model
Figure 2: SHAP beeswarm plot for the CatBoost model
Figure 3: SHAP dependence plots for the CatBoost model
Figure 4: Explaining an individual prediction via SHAP waterfall plot for the CatBoost model

Fit and explain any model

To demonstate the model agnostic SHAP cruncher explain_any(), let’s fit a linear regression model with interactions and natural cubic spline.

lm = smf.ols("log_price ~ cr(log_carat, df=4) + clarity * color + cut", data=df)
lm = lm.fit()

# SHAP analysis - automatically picking exact permutation SHAP
# due to the small number of features
X_explain = X.sample(1000, random_state=0)
lm_explanation = explain_any(lm.predict, X_explain)  # 5s on laptop

lm_explanation.plot.bar()
lm_explanation.plot.beeswarm()
lm_explanation.plot.scatter(sharey=False)
lm_explanation.plot.waterfall(row_id=0)
Figure 5: SHAP importance plot for the linear regression
Figure 6: SHAP beeswarm plot for the linear regression
Figure 7: SHAP dependence plots for the linear regression
Figure 8: SHAP waterfall plot to explain a single prediction of the linear regression

How to contribute?

  1. Test, test, test: The more people are using and testing the current beta version of the package, the better it will get.
  2. Open issues: If you see problems or gaps, please open an issue. Then we will discuss if/who will work on this.

Future plans

In its current early stage, the project is still a “one-man show”. While growing, the aim is to move the project to a bigger organisation, e.g., a university.

Jupyter notebook

To leave a comment for the author, please follow the link and comment on their blog: Python – Michael's and Christian's Blog .

Want to share your content on python-bloggers? click here.