LSBoost: Explainable ‘AI’ using Gradient Boosted randomized networks (with examples in R and Python)

T. Moudiki

6 years ago

This article was first published on T. Moudiki's Webpage - Python , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Disclaimer: I have no affiliation with The Next Web (cf. online article)

A few weeks ago I read this interesting and accessible article about explainable AI, discussing more specifically self-explainable AI issues. I’m not sure – anymore – if there’s a mandatory need for AI models that explain themselves, as there are model-agnostic tools such as the teller – among many others – for helping them in doing just that.

With that being said, the new LSBoost algorithm implemented in mlsauce does, explain itself. LSBoost is a cousin of the LS_Boost algorithm introduced in
GREEDY FUNCTION APPROXIMATION: A GRADIENT BOOSTING MACHINE (GFAGBM). GFAGBM’s LS_Boost is outlined below:

So, what makes the new LSBoost different? Would you be legitimately entitled to ask. Well, about the seemingly new name: I actually misspelled LS_Boost in my code in the first place! So, it’ll remain named as it is now and forever. Otherwise, in the new LSBoost we have:

Page 1203, section 5 of GFAGBM is used: LSBoost contains a learning rate which could accelerate or slow down the convergence of residuals towards 0. Overfitting, fast or slow.
Function h (referring to Algorithm 2 in GFAGBM) returns a columnwise concatenation of x and a – so called – neuron or node:

a (referring to Algorithm 2 in GFAGBM) contains elements of a matrix of simulated uniform random numbers whose size can be controlled, in a randomized networks’ fashion.
Both columns and rows of X (containing x’s) can be subsampled, in order to increase the diversity of the weak learners h fitting the successive residuals.
Instead of optimizing least squares at line 4 of Algorithm 2, penalized least squares are used. Currently, ridge regression is implemented, and its bias has the effect of slowing down the convergence of residuals towards 0.
An early stopping criterion is implemented, and is based on the magnitude of successive residuals.

Besides this, we can also remark that LSBoost is explainable as a linear model, while being a highly nonlinear one. Indeed by using some calculus, it’s possible to compute derivatives of F (still referring to Algorithm 2 outlined before) relative to x, wherever the function h does admit a derivative.

In the following Python+R examples appearing after the short survey (both tested on Linux and macOS so far), we’ll use LSBoost with default hyperparameters, for solving regression and classification problems. There’s still some room for improvement of models performance.

I – Python version

I – 0 – Install and import packages

Install mlsauce (command line)

pip install mlsauce --upgrade

Import packages



import numpy as np 
from sklearn.datasets import load_boston, load_diabetes
from sklearn.model_selection import train_test_split, GridSearchCV, cross_val_score
from time import time
from os import chdir
from sklearn import metrics

import mlsauce as ms

I – 1 – Classification

I – 1 – 1 Breast cancer dataset



# data 1
breast_cancer = load_breast_cancer()
X = breast_cancer.data
y = breast_cancer.target
# split data into training test and test set
np.random.seed(15029)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2)


print("dataset 1 -- breast cancer -----")

print(X.shape)
obj = ms.LSBoostClassifier()
# using default parameters
print(obj.get_params())

start = time()
obj.fit(X_train, y_train)
print(time()-start)
start = time()
print(obj.score(X_test, y_test))
print(time()-start)

# classification report
y_pred = obj.predict(X_test)
print(classification_report(y_test, y_pred))

dataset 1 -- breast cancer -----


(569, 30)


{'backend': 'cpu', 'col_sample': 1, 'direct_link': 1, 'dropout': 0, 'learning_rate': 0.1, 'n_estimators': 100, 'n_hidden_features': 5, 'reg_lambda': 0.1, 'row_sample': 1, 'seed': 123, 'tolerance': 0.0001, 'verbose': 1}


0.16006875038146973
0.9473684210526315
0.015897750854492188


              precision    recall  f1-score   support

           0       1.00      0.86      0.92        42
           1       0.92      1.00      0.96        72

    accuracy                           0.95       114
   macro avg       0.96      0.93      0.94       114
weighted avg       0.95      0.95      0.95       114

I – 1 – 2 Wine dataset



# data 2
wine = load_wine()
Z = wine.data
t = wine.target
np.random.seed(879423)
X_train, X_test, y_train, y_test = train_test_split(Z, t, 
                                                    test_size=0.2)


print("dataset 2 -- wine -----")

print(Z.shape)
obj = ms.LSBoostClassifier()
# using default parameters
print(obj.get_params())

start = time()
obj.fit(X_train, y_train)
print(time()-start)
start = time()
print(obj.score(X_test, y_test))
print(time()-start)

# classification report
y_pred = obj.predict(X_test)
print(classification_report(y_test, y_pred))

dataset 2 -- wine -----


(178, 13)


{'backend': 'cpu', 'col_sample': 1, 'direct_link': 1, 'dropout': 0, 'learning_rate': 0.1, 'n_estimators': 100, 'n_hidden_features': 5, 'reg_lambda': 0.1, 'row_sample': 1, 'seed': 123, 'tolerance': 0.0001, 'verbose': 1}


0.1548290252685547
0.9722222222222222
0.021778583526611328


              precision    recall  f1-score   support

           0       1.00      0.93      0.97        15
           1       0.92      1.00      0.96        12
           2       1.00      1.00      1.00         9

    accuracy                           0.97        36
   macro avg       0.97      0.98      0.98        36
weighted avg       0.97      0.97      0.97        36

I – 1 – 3 iris dataset



# data 3
iris = load_iris()
Z = iris.data
t = iris.target
np.random.seed(734563)
X_train, X_test, y_train, y_test = train_test_split(Z, t, 
                                                    test_size=0.2)

print("dataset 3 -- iris -----")

print(Z.shape)
obj = ms.LSBoostClassifier()
# using default parameters
print(obj.get_params())

start = time()
obj.fit(X_train, y_train)
print(time()-start)
start = time()
print(obj.score(X_test, y_test))
print(time()-start)

# classification report
y_pred = obj.predict(X_test)
print(classification_report(y_test, y_pred))

dataset 3 -- iris -----


(150, 4)


{'backend': 'cpu', 'col_sample': 1, 'direct_link': 1, 'dropout': 0, 'learning_rate': 0.1, 'n_estimators': 100, 'n_hidden_features': 5, 'reg_lambda': 0.1, 'row_sample': 1, 'seed': 123, 'tolerance': 0.0001, 'verbose': 1}


100%|██████████| 100/100 [00:00<00:00, 1157.03it/s]

0.0932917594909668
0.9666666666666667
0.007458209991455078


              precision    recall  f1-score   support

           0       1.00      1.00      1.00        13
           1       1.00      0.90      0.95        10
           2       0.88      1.00      0.93         7

    accuracy                           0.97        30
   macro avg       0.96      0.97      0.96        30
weighted avg       0.97      0.97      0.97        30

I – 2 – Regression

I – 2 – 1 Boston dataset



# data 1
boston = load_boston()
X = boston.data
y = boston.target
# split data into training test and test set
np.random.seed(15029)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2)

print("dataset 4 -- boston -----")

print(X.shape)
obj = ms.LSBoostRegressor()
# using default parameters
print(obj.get_params())

start = time()
obj.fit(X_train, y_train)
print(time()-start)
start = time()
print(np.sqrt(np.mean(np.square(obj.predict(X_test) - y_test))))
print(time()-start)

dataset 4 -- boston -----


(506, 13)


{'backend': 'cpu', 'col_sample': 1, 'direct_link': 1, 'dropout': 0, 'learning_rate': 0.1, 'n_estimators': 100, 'n_hidden_features': 5, 'reg_lambda': 0.1, 'row_sample': 1, 'seed': 123, 'tolerance': 0.0001, 'verbose': 1}


100%|██████████| 100/100 [00:00<00:00, 896.24it/s]
  0%|          | 0/100 [00:00<?, ?it/s]

0.1198277473449707
3.4934156173105206
0.01007080078125

I – 2 – 2 Diabetes dataset



# data 2
diabetes = load_diabetes()
X = diabetes.data
y = diabetes.target
# split data into training test and test set
np.random.seed(15029)
X_train, X_test, y_train, y_test = train_test_split(X, y, 
                                                    test_size=0.2)

print("dataset 5 -- diabetes -----")

print(X.shape)
obj = ms.LSBoostRegressor()
# using default parameters
print(obj.get_params())

start = time()
obj.fit(X_train, y_train)
print(time()-start)
start = time()
print(np.sqrt(np.mean(np.square(obj.predict(X_test) - y_test))))
print(time()-start)

dataset 5 -- diabetes -----


(442, 10)


{'backend': 'cpu', 'col_sample': 1, 'direct_link': 1, 'dropout': 0, 'learning_rate': 0.1, 'n_estimators': 100, 'n_hidden_features': 5, 'reg_lambda': 0.1, 'row_sample': 1, 'seed': 123, 'tolerance': 0.0001, 'verbose': 1}


100%|██████████| 100/100 [00:00<00:00, 1000.60it/s]

0.10351037979125977
55.867989174555625
0.012843847274780273

II – R version

I – 0 – Install and import packages



library(devtools)
devtools::install_github("thierrymoudiki/mlsauce/R-package")
library(mlsauce)

II – 1 – Classification



library(datasets)

X <- as.matrix(iris[, 1:4])
y <- as.integer(iris[, 5]) - 1L

n <- dim(X)[1]
p <- dim(X)[2]
set.seed(21341)
train_index <- sample(x = 1:n, size = floor(0.8*n), replace = TRUE)
test_index <- -train_index
X_train <- as.matrix(X[train_index, ])
y_train <- as.integer(y[train_index])
X_test <- as.matrix(X[test_index, ])
y_test <- as.integer(y[test_index])

# using default parameters
obj <- mlsauce::LSBoostClassifier()

start <- proc.time()[3]
obj$fit(X_train, y_train)	
print(proc.time()[3] - start)

start <- proc.time()[3]
print(obj$score(X_test, y_test))
print(proc.time()[3] - start)

elapsed 
  0.051 
 0.9253731
 elapsed 
  0.011

II – 2 – Regression



library(datasets)

X <- as.matrix(datasets::mtcars[, -1])
y <- as.integer(datasets::mtcars[, 1])

n <- dim(X)[1]
p <- dim(X)[2]
set.seed(21341)
train_index <- sample(x = 1:n, size = floor(0.8*n), replace = TRUE)
test_index <- -train_index
X_train <- as.matrix(X[train_index, ])
y_train <- as.double(y[train_index])
X_test <- as.matrix(X[test_index, ])
y_test <- as.double(y[test_index])

# using default parameters
obj <- mlsauce::LSBoostRegressor()

start <- proc.time()[3]
obj$fit(X_train, y_train)
print(proc.time()[3] - start)

start <- proc.time()[3]
print(sqrt(mean((obj$predict(X_test) - y_test)**2)))
print(proc.time()[3] - start)

elapsed 
  0.044 
6.482376
elapsed 
   0.01

To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - Python .

Want to share your content on python-bloggers? click here.