Want to share your content on python-bloggers? click here.
In this post, I use the following Python packages:
-
BCN
: for adjusting Boosted Configuration Networks regression (see post 1 and post 2 for more
details on BCNs, and this notebook for classification examples) tosklearn
’sdiabetes
dataset. -
the-teller
: for interpreting BCNs, and obtaining prediction intervals. So far, as of october 2022,the-teller
contains two methods for constructing prediction intervals: split conformal and local conformal. For more details on both of these methods, the interested reader can consult [1]. Despite their potential drawbacks (listed in [1]), they’re model-agnostic, i.e they do not require in particular for the user to adjust a quantile regression model.
Content
- 0 – Install packages
- 1 – Import packages + load data
- 2 – Adjust BCN model to training set, evaluation on test set
- 3 – Adjust the teller’s Explainer to test set
- 4 – Prediction intervals on test set
- 4 – 1 Split conformal
- 4 – 2 Local conformal
- 5 – Plot prediction intervals
0 – Install packages
Package implementing Boosted Configuration Networks
pip install BCN
Package for Machine Learning Explainability on tabular data
pip install the-teller
Other packages
pip install scikit-learn numpy
1 – Import packages + load data
import BCN as bcn # takes a long time to run, ONLY the first time it's run import teller as tr import numpy as np from sklearn.datasets import load_diabetes from sklearn.model_selection import train_test_split from sklearn import metrics from time import time
import diabetes
dataset
dataset = load_diabetes() X = dataset.data y = dataset.target # split data into training test and test set X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=13)
2 – Adjust BCN model to training set, evaluation on test set
start = time() regr_bcn = bcn.BCNRegressor(**{'B': 122, 'nu': 0.7862040612242153, 'col_sample': 0.6797268557618982, 'lam': 0.9244772287770061, 'r': 0.18915458607547728, 'tol': 9.670499926559012e-10}) regr_bcn.fit(X_train, y_train) print(f"\nElapsed {time() - start}") preds = regr_bcn.predict(X_test) # Test set RMSE print(np.sqrt(np.mean(np.square(y_test - preds))))
Elapsed 0.641953706741333 54.50979209778048
3 – Adjust the teller’s Explainer to test set
start = time() # creating an Explainer for the fitted object `regr_bcn` expr = tr.Explainer(obj=regr_bcn) # confidence int. and tests on covariates' effects (Jackknife) expr.fit(X_test, y_test, X_names=dataset.feature_names, method="ci") # summary of results expr.summary() # timing print(f"\n Elapsed: {time()-start}")
Score (rmse): 54.51 Residuals: Min 1Q Median 3Q Max -119.79528 -30.293474 9.749065 45.169791 124.891462 Tests on marginal effects (Jackknife): Estimate Std. Error 95% lbound 95% ubound Pr(>|t|) bmi 534.170652 10.538064 513.228463 555.11284 0.0 *** s5 460.090298 11.7154 436.808403 483.372194 0.0 *** bp 245.510926 7.073121 231.454584 259.567267 0.0 *** s6 117.899033 8.865569 100.280578 135.517488 0.0 *** s4 71.20425 4.885591 61.495165 80.913336 0.0 *** age 3.144209 7.616405 -11.991795 18.280213 0.680742 - s1 -31.76086 8.823018 -49.294754 -14.226967 0.000526 *** s2 -104.139443 8.596589 -121.223357 -87.055529 0.0 *** sex -167.461194 3.92965 -175.270546 -159.651841 0.0 *** s3 -210.749954 5.580957 -221.840934 -199.658975 0.0 *** Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘-’ 1 Multiple R-squared: 0.394, Adjusted R-squared: 0.316 Elapsed: 0.1125936508178711
For more details on this output, you can read this post applying the-teller to xgboost’s explanation.
4 – Prediction intervals on test set
4 – 1 Split conformal
pi = tr.PredictionInterval(regr_bcn, method="splitconformal", level=0.95) pi.fit(X_train, y_train) preds = pi.predict(X_test, return_pi=True)
pred = preds[0] y_lower = preds[1] y_upper = preds[2] print(len(pred)) # compute and display the average coverage print(np.mean((y_test >= y_lower) & (y_test <= y_upper))) # prediction interval's length length_pi = y_upper - y_lower print(np.mean(length_pi))
89 0.9438202247191011 209.55455918396666
4 – 2 Local conformal
pi2 = tr.PredictionInterval(regr_bcn, method="localconformal", level=0.95) pi2.fit(X_train, y_train) preds2 = pi2.predict(X_test, return_pi=True)
pred2 = preds2[0] y_lower2 = preds2[1] y_upper2 = preds2[2] print(len(pred2)) # compute and display the average coverage print(np.mean((y_test >= y_lower2) & (y_test <= y_upper2))) # prediction interval's length length_pi2 = y_upper2 - y_lower2 print(np.mean(length_pi2))
89 0.9550561797752809 229.03014949955275
5 – Plot prediction intervals
import warnings warnings.filterwarnings('ignore') import matplotlib matplotlib.use('Agg') import matplotlib.pyplot as plt import numpy as np np.warnings.filterwarnings('ignore') split_color = 'tomato' local_color = 'gray' %matplotlib inline np.random.seed(1) def plot_func(x, y, y_u=None, y_l=None, pred=None, shade_color="", method_name="", title=""): fig = plt.figure() plt.plot(x, y, 'k.', alpha=.3, markersize=10, fillstyle='full', label=u'Test set observations') if (y_u is not None) and (y_l is not None): plt.fill(np.concatenate([x, x[::-1]]), np.concatenate([y_u, y_l[::-1]]), alpha=.3, fc=shade_color, ec='None', label = method_name + ' Prediction interval') if pred is not None: plt.plot(x, pred, 'k--', lw=2, alpha=0.9, label=u'Predicted value') #plt.ylim([-2.5, 7]) plt.xlabel('$X$') plt.ylabel('$Y$') plt.legend(loc='upper right') plt.title(title) plt.show()
Split conformal
max_idx = 25 plot_func(x = range(max_idx), y = y_test[0:max_idx], y_u = y_upper[0:max_idx], y_l = y_lower[0:max_idx], pred = pred[0:max_idx], shade_color=split_color, title = f"Split conformal ({max_idx} first points in test set)")
Local conformal
max_idx = 25 plot_func(x = range(max_idx), y = y_test[0:max_idx], y_u = y_upper2[0:max_idx], y_l = y_lower2[0:max_idx], pred = pred2[0:max_idx], shade_color=local_color, title = f"Local conformal ({max_idx} first points in test set)")
[1] Romano, Y., Patterson, E., & Candes, E. (2019). Conformalized quantile regression. Advances in neural information processing systems, 32.
Want to share your content on python-bloggers? click here.