LSBoost: Explainable ‘AI’ using Gradient Boosted randomized networks (with examples in R and Python)
Want to share your content on python-bloggers? click here.
Disclaimer: I have no affiliation with The Next Web (cf. online article)
A few weeks ago I read this interesting and accessible article about explainable AI, discussing more specifically self-explainable AI issues. I’m not sure – anymore – if there’s a mandatory need for AI models that explain themselves, as there are model-agnostic tools such as the teller – among many others – for helping them in doing just that.
With that being said, the new LSBoost
algorithm implemented in mlsauce does, explain itself. LSBoost
is a cousin of the LS_Boost
algorithm introduced in
GREEDY FUNCTION APPROXIMATION: A GRADIENT BOOSTING MACHINE (GFAGBM). GFAGBM’s LS_Boost
is outlined below:
So, what makes the new LSBoost
different? Would you be legitimately entitled to ask. Well, about the seemingly new name: I actually misspelled LS_Boost
in my code in the first place! So, it’ll remain named as it is now and forever. Otherwise, in the new LSBoost
we have:
- Page 1203, section 5 of GFAGBM is used:
LSBoost
contains a learning rate which could accelerate or slow down the convergence of residuals towards 0. Overfitting, fast or slow. - Function h (referring to Algorithm 2 in GFAGBM) returns a columnwise concatenation of x and a – so called – neuron or node:
- a (referring to Algorithm 2 in GFAGBM) contains elements of a matrix of simulated uniform random numbers whose size can be controlled, in a randomized networks’ fashion.
- Both columns and rows of X (containing x’s) can be subsampled, in order to increase the diversity of the weak learners h fitting the successive residuals.
- Instead of optimizing least squares at line 4 of Algorithm 2, penalized least squares are used. Currently, ridge regression is implemented, and its bias has the effect of slowing down the convergence of residuals towards 0.
- An early stopping criterion is implemented, and is based on the magnitude of successive residuals.
Besides this, we can also remark that LSBoost
is explainable as a linear model, while being a highly nonlinear one. Indeed by using some calculus, it’s possible to compute derivatives of F (still referring to Algorithm 2 outlined before) relative to x, wherever the function h does admit a derivative.
In the following Python+R examples appearing after the short survey (both tested on Linux and macOS so far), we’ll use LSBoost
with default hyperparameters, for solving regression and classification problems. There’s still some room for improvement of models performance.
I – Python version
I – 0 – Install and import packages
Install mlsauce (command line)
pip install mlsauce --upgrade
Import packages
import numpy as np from sklearn.datasets import load_boston, load_diabetes from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score from time import time from os import chdir from sklearn import metrics import mlsauce as ms
I – 1 – Classification
I – 1 – 1 Breast cancer dataset
# data 1 breast_cancer = load_breast_cancer() X = breast_cancer.data y = breast_cancer.target # split data into training test and test set np.random.seed(15029) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) print("dataset 1 -- breast cancer -----") print(X.shape) obj = ms.LSBoostClassifier() # using default parameters print(obj.get_params()) start = time() obj.fit(X_train, y_train) print(time()-start) start = time() print(obj.score(X_test, y_test)) print(time()-start) # classification report y_pred = obj.predict(X_test) print(classification_report(y_test, y_pred))
dataset 1 -- breast cancer ----- (569, 30) {'backend': 'cpu', 'col_sample': 1, 'direct_link': 1, 'dropout': 0, 'learning_rate': 0.1, 'n_estimators': 100, 'n_hidden_features': 5, 'reg_lambda': 0.1, 'row_sample': 1, 'seed': 123, 'tolerance': 0.0001, 'verbose': 1} 0.16006875038146973 0.9473684210526315 0.015897750854492188 precision recall f1-score support 0 1.00 0.86 0.92 42 1 0.92 1.00 0.96 72 accuracy 0.95 114 macro avg 0.96 0.93 0.94 114 weighted avg 0.95 0.95 0.95 114
I – 1 – 2 Wine dataset
# data 2 wine = load_wine() Z = wine.data t = wine.target np.random.seed(879423) X_train, X_test, y_train, y_test = train_test_split(Z, t, test_size=0.2) print("dataset 2 -- wine -----") print(Z.shape) obj = ms.LSBoostClassifier() # using default parameters print(obj.get_params()) start = time() obj.fit(X_train, y_train) print(time()-start) start = time() print(obj.score(X_test, y_test)) print(time()-start) # classification report y_pred = obj.predict(X_test) print(classification_report(y_test, y_pred))
dataset 2 -- wine ----- (178, 13) {'backend': 'cpu', 'col_sample': 1, 'direct_link': 1, 'dropout': 0, 'learning_rate': 0.1, 'n_estimators': 100, 'n_hidden_features': 5, 'reg_lambda': 0.1, 'row_sample': 1, 'seed': 123, 'tolerance': 0.0001, 'verbose': 1} 0.1548290252685547 0.9722222222222222 0.021778583526611328 precision recall f1-score support 0 1.00 0.93 0.97 15 1 0.92 1.00 0.96 12 2 1.00 1.00 1.00 9 accuracy 0.97 36 macro avg 0.97 0.98 0.98 36 weighted avg 0.97 0.97 0.97 36
I – 1 – 3 iris dataset
# data 3 iris = load_iris() Z = iris.data t = iris.target np.random.seed(734563) X_train, X_test, y_train, y_test = train_test_split(Z, t, test_size=0.2) print("dataset 3 -- iris -----") print(Z.shape) obj = ms.LSBoostClassifier() # using default parameters print(obj.get_params()) start = time() obj.fit(X_train, y_train) print(time()-start) start = time() print(obj.score(X_test, y_test)) print(time()-start) # classification report y_pred = obj.predict(X_test) print(classification_report(y_test, y_pred))
dataset 3 -- iris ----- (150, 4) {'backend': 'cpu', 'col_sample': 1, 'direct_link': 1, 'dropout': 0, 'learning_rate': 0.1, 'n_estimators': 100, 'n_hidden_features': 5, 'reg_lambda': 0.1, 'row_sample': 1, 'seed': 123, 'tolerance': 0.0001, 'verbose': 1} 100%|██████████| 100/100 [00:00<00:00, 1157.03it/s] 0.0932917594909668 0.9666666666666667 0.007458209991455078 precision recall f1-score support 0 1.00 1.00 1.00 13 1 1.00 0.90 0.95 10 2 0.88 1.00 0.93 7 accuracy 0.97 30 macro avg 0.96 0.97 0.96 30 weighted avg 0.97 0.97 0.97 30
I – 2 – Regression
I – 2 – 1 Boston dataset
# data 1 boston = load_boston() X = boston.data y = boston.target # split data into training test and test set np.random.seed(15029) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) print("dataset 4 -- boston -----") print(X.shape) obj = ms.LSBoostRegressor() # using default parameters print(obj.get_params()) start = time() obj.fit(X_train, y_train) print(time()-start) start = time() print(np.sqrt(np.mean(np.square(obj.predict(X_test) - y_test)))) print(time()-start)
dataset 4 -- boston ----- (506, 13) {'backend': 'cpu', 'col_sample': 1, 'direct_link': 1, 'dropout': 0, 'learning_rate': 0.1, 'n_estimators': 100, 'n_hidden_features': 5, 'reg_lambda': 0.1, 'row_sample': 1, 'seed': 123, 'tolerance': 0.0001, 'verbose': 1} 100%|██████████| 100/100 [00:00<00:00, 896.24it/s] 0%| | 0/100 [00:00<?, ?it/s] 0.1198277473449707 3.4934156173105206 0.01007080078125
I – 2 – 2 Diabetes dataset
# data 2 diabetes = load_diabetes() X = diabetes.data y = diabetes.target # split data into training test and test set np.random.seed(15029) X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2) print("dataset 5 -- diabetes -----") print(X.shape) obj = ms.LSBoostRegressor() # using default parameters print(obj.get_params()) start = time() obj.fit(X_train, y_train) print(time()-start) start = time() print(np.sqrt(np.mean(np.square(obj.predict(X_test) - y_test)))) print(time()-start)
dataset 5 -- diabetes ----- (442, 10) {'backend': 'cpu', 'col_sample': 1, 'direct_link': 1, 'dropout': 0, 'learning_rate': 0.1, 'n_estimators': 100, 'n_hidden_features': 5, 'reg_lambda': 0.1, 'row_sample': 1, 'seed': 123, 'tolerance': 0.0001, 'verbose': 1} 100%|██████████| 100/100 [00:00<00:00, 1000.60it/s] 0.10351037979125977 55.867989174555625 0.012843847274780273
II – R version
I – 0 – Install and import packages
library(devtools) devtools::install_github("thierrymoudiki/mlsauce/R-package") library(mlsauce)
II – 1 – Classification
library(datasets) X <- as.matrix(iris[, 1:4]) y <- as.integer(iris[, 5]) - 1L n <- dim(X)[1] p <- dim(X)[2] set.seed(21341) train_index <- sample(x = 1:n, size = floor(0.8*n), replace = TRUE) test_index <- -train_index X_train <- as.matrix(X[train_index, ]) y_train <- as.integer(y[train_index]) X_test <- as.matrix(X[test_index, ]) y_test <- as.integer(y[test_index]) # using default parameters obj <- mlsauce::LSBoostClassifier() start <- proc.time()[3] obj$fit(X_train, y_train) print(proc.time()[3] - start) start <- proc.time()[3] print(obj$score(X_test, y_test)) print(proc.time()[3] - start)
elapsed 0.051 0.9253731 elapsed 0.011
II – 2 – Regression
library(datasets) X <- as.matrix(datasets::mtcars[, -1]) y <- as.integer(datasets::mtcars[, 1]) n <- dim(X)[1] p <- dim(X)[2] set.seed(21341) train_index <- sample(x = 1:n, size = floor(0.8*n), replace = TRUE) test_index <- -train_index X_train <- as.matrix(X[train_index, ]) y_train <- as.double(y[train_index]) X_test <- as.matrix(X[test_index, ]) y_test <- as.double(y[test_index]) # using default parameters obj <- mlsauce::LSBoostRegressor() start <- proc.time()[3] obj$fit(X_train, y_train) print(proc.time()[3] - start) start <- proc.time()[3] print(sqrt(mean((obj$predict(X_test) - y_test)**2))) print(proc.time()[3] - start)
elapsed 0.044 6.482376 elapsed 0.01
Want to share your content on python-bloggers? click here.