Python-bloggers

Gradient-Boosting anything (alert: high performance): Part3, Histogram-based boosting

This article was first published on T. Moudiki's Webpage - Python , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

A few weeks ago, I intoduced a model-agnostic gradient boosting procedure, that can use any base learner (available in R and Python package mlsauce):

The rationale is different from other histogram-based gradient boosting algorithms, as histograms are only used here for feature engineering of continuous features. So far, I don’t see huge differences with the original implementation of the GenericBooster, but it’s still a work in progress. I envisage to try it out on a data set that contains a ‘higher’ mix of continuous and categorical features (as categorical features are not histogram-engineered).

Here are a few results that can give you an idea of the performance of the algorithm:

!pip install git+https://github.com/Techtonique/mlsauce.git --verbose --upgrade --no-cache-dir
import os
import mlsauce as ms
from sklearn.datasets import load_breast_cancer, load_iris, load_wine, load_digits
from sklearn.model_selection import train_test_split
from time import time

load_models = [load_breast_cancer, load_iris, load_wine, load_digits]

for model in load_models:

    data = model()
    X = data.data
    y= data.target

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .2, random_state = 13)

    clf = ms.LazyBoostingClassifier(verbose=0, ignore_warnings=True, #n_jobs=2,
                                    custom_metric=None, preprocess=False)

    start = time()
    models, predictioms = clf.fit(X_train, X_test, y_train, y_test, hist=True)
    models2, predictioms = clf.fit(X_train, X_test, y_train, y_test, hist=False)
    print(f"\nElapsed: {time() - start} seconds\n")

    display(models)
    display(models2)
2it [00:00,  2.27it/s]
100%|██████████| 38/38 [00:41<00:00,  1.09s/it]
2it [00:00,  5.14it/s]
100%|██████████| 38/38 [00:43<00:00,  1.14s/it]


Elapsed: 85.95083284378052 seconds
Accuracy Balanced Accuracy ROC AUC F1 Score Time Taken
Model
GenericBooster(MultiTask(TweedieRegressor)) 0.99 0.99 0.99 0.99 1.73
GenericBooster(LinearRegression) 0.99 0.99 0.99 0.99 0.37
GenericBooster(TransformedTargetRegressor) 0.99 0.99 0.99 0.99 0.40
GenericBooster(RidgeCV) 0.99 0.99 0.99 0.99 1.28
GenericBooster(Ridge) 0.99 0.99 0.99 0.99 0.27
XGBClassifier 0.96 0.96 0.96 0.96 0.50
RandomForestClassifier 0.96 0.96 0.96 0.96 0.37
GenericBooster(ExtraTreeRegressor) 0.94 0.94 0.94 0.94 0.40
GenericBooster(MultiTask(BayesianRidge)) 0.94 0.93 0.93 0.94 4.97
GenericBooster(KNeighborsRegressor) 0.87 0.89 0.89 0.87 0.70
GenericBooster(DecisionTreeRegressor) 0.87 0.88 0.88 0.87 2.24
GenericBooster(MultiTaskElasticNet) 0.87 0.79 0.79 0.86 0.11
GenericBooster(MultiTask(PassiveAggressiveRegressor)) 0.86 0.79 0.79 0.85 1.28
GenericBooster(MultiTaskLasso) 0.85 0.76 0.76 0.84 0.06
GenericBooster(ElasticNet) 0.85 0.76 0.76 0.84 0.16
GenericBooster(MultiTask(QuantileRegressor)) 0.82 0.72 0.72 0.80 10.42
GenericBooster(Lasso) 0.82 0.71 0.71 0.79 0.09
GenericBooster(LassoLars) 0.82 0.71 0.71 0.79 0.08
GenericBooster(MultiTask(LinearSVR)) 0.81 0.69 0.69 0.78 14.75
GenericBooster(DummyRegressor) 0.68 0.50 0.50 0.56 0.01
GenericBooster(MultiTask(SGDRegressor)) 0.50 0.46 0.46 0.51 1.67

<svg xmlns=”http://www.w3.org/2000/svg” height=”24px”viewBox=”0 0 24 24″
width=”24px”> </svg>

Accuracy Balanced Accuracy ROC AUC F1 Score Time Taken
Model
GenericBooster(MultiTask(TweedieRegressor)) 0.99 0.99 0.99 0.99 1.67
GenericBooster(LinearRegression) 0.99 0.99 0.99 0.99 0.30
GenericBooster(TransformedTargetRegressor) 0.99 0.99 0.99 0.99 0.74
GenericBooster(RidgeCV) 0.99 0.99 0.99 0.99 2.77
GenericBooster(Ridge) 0.99 0.99 0.99 0.99 0.28
XGBClassifier 0.96 0.96 0.96 0.96 0.13
GenericBooster(MultiTask(BayesianRidge)) 0.94 0.93 0.93 0.94 7.81
GenericBooster(ExtraTreeRegressor) 0.94 0.94 0.94 0.94 0.23
RandomForestClassifier 0.92 0.93 0.93 0.92 0.25
GenericBooster(KNeighborsRegressor) 0.87 0.89 0.89 0.87 0.42
GenericBooster(DecisionTreeRegressor) 0.87 0.88 0.88 0.87 0.97
GenericBooster(MultiTaskElasticNet) 0.87 0.79 0.79 0.86 0.11
GenericBooster(MultiTask(PassiveAggressiveRegressor)) 0.86 0.79 0.79 0.85 1.20
GenericBooster(MultiTaskLasso) 0.85 0.76 0.76 0.84 0.06
GenericBooster(ElasticNet) 0.85 0.76 0.76 0.84 0.09
GenericBooster(MultiTask(QuantileRegressor)) 0.82 0.72 0.72 0.80 10.57
GenericBooster(LassoLars) 0.82 0.71 0.71 0.79 0.09
GenericBooster(Lasso) 0.82 0.71 0.71 0.79 0.09
GenericBooster(MultiTask(LinearSVR)) 0.81 0.69 0.69 0.78 14.20
GenericBooster(DummyRegressor) 0.68 0.50 0.50 0.56 0.01
GenericBooster(MultiTask(SGDRegressor)) 0.50 0.46 0.46 0.51 1.33

<svg xmlns=”http://www.w3.org/2000/svg” height=”24px”viewBox=”0 0 24 24″
width=”24px”> </svg>

2it [00:00,  6.46it/s]
100%|██████████| 38/38 [00:12<00:00,  3.11it/s]
2it [00:00, 10.38it/s]
100%|██████████| 38/38 [00:11<00:00,  3.18it/s]


Elapsed: 24.71835470199585 seconds
Accuracy Balanced Accuracy ROC AUC F1 Score Time Taken
Model
GenericBooster(RidgeCV) 1.00 1.00 None 1.00 0.18
GenericBooster(Ridge) 1.00 1.00 None 1.00 0.14
GenericBooster(LinearRegression) 0.97 0.97 None 0.97 0.13
GenericBooster(DecisionTreeRegressor) 0.97 0.97 None 0.97 0.18
GenericBooster(TransformedTargetRegressor) 0.97 0.97 None 0.97 0.23
GenericBooster(ExtraTreeRegressor) 0.97 0.97 None 0.97 0.14
XGBClassifier 0.97 0.97 None 0.97 0.05
RandomForestClassifier 0.93 0.95 None 0.93 0.26
GenericBooster(KNeighborsRegressor) 0.93 0.95 None 0.93 0.27
GenericBooster(MultiTask(SGDRegressor)) 0.90 0.92 None 0.90 0.75
GenericBooster(MultiTask(TweedieRegressor)) 0.90 0.92 None 0.90 1.61
GenericBooster(MultiTask(LinearSVR)) 0.80 0.85 None 0.80 2.15
GenericBooster(MultiTaskElasticNet) 0.80 0.85 None 0.80 0.07
GenericBooster(MultiTask(BayesianRidge)) 0.63 0.72 None 0.57 2.42
GenericBooster(MultiTask(PassiveAggressiveRegressor)) 0.57 0.67 None 0.45 1.05
GenericBooster(Lars) 0.50 0.46 None 0.48 0.59
GenericBooster(MultiTask(QuantileRegressor)) 0.43 0.33 None 0.26 2.19
GenericBooster(LassoLars) 0.27 0.33 None 0.11 0.01
GenericBooster(MultiTaskLasso) 0.27 0.33 None 0.11 0.01
GenericBooster(Lasso) 0.27 0.33 None 0.11 0.01
GenericBooster(ElasticNet) 0.27 0.33 None 0.11 0.01
GenericBooster(DummyRegressor) 0.27 0.33 None 0.11 0.01

<svg xmlns=”http://www.w3.org/2000/svg” height=”24px”viewBox=”0 0 24 24″
width=”24px”> </svg>

Accuracy Balanced Accuracy ROC AUC F1 Score Time Taken
Model
GenericBooster(RidgeCV) 1.00 1.00 None 1.00 0.16
GenericBooster(Ridge) 1.00 1.00 None 1.00 0.16
RandomForestClassifier 0.97 0.97 None 0.97 0.15
GenericBooster(LinearRegression) 0.97 0.97 None 0.97 0.13
GenericBooster(DecisionTreeRegressor) 0.97 0.97 None 0.97 0.16
GenericBooster(TransformedTargetRegressor) 0.97 0.97 None 0.97 0.24
GenericBooster(ExtraTreeRegressor) 0.97 0.97 None 0.97 0.14
XGBClassifier 0.97 0.97 None 0.97 0.04
GenericBooster(KNeighborsRegressor) 0.93 0.95 None 0.93 0.28
GenericBooster(MultiTask(SGDRegressor)) 0.90 0.92 None 0.90 0.78
GenericBooster(MultiTask(TweedieRegressor)) 0.90 0.92 None 0.90 1.35
GenericBooster(MultiTask(LinearSVR)) 0.80 0.85 None 0.80 2.15
GenericBooster(MultiTaskElasticNet) 0.80 0.85 None 0.80 0.07
GenericBooster(MultiTask(BayesianRidge)) 0.63 0.72 None 0.57 1.81
GenericBooster(MultiTask(PassiveAggressiveRegressor)) 0.57 0.67 None 0.45 1.21
GenericBooster(Lars) 0.50 0.46 None 0.48 0.58
GenericBooster(MultiTask(QuantileRegressor)) 0.43 0.33 None 0.26 2.63
GenericBooster(LassoLars) 0.27 0.33 None 0.11 0.01
GenericBooster(MultiTaskLasso) 0.27 0.33 None 0.11 0.01
GenericBooster(Lasso) 0.27 0.33 None 0.11 0.01
GenericBooster(ElasticNet) 0.27 0.33 None 0.11 0.02
GenericBooster(DummyRegressor) 0.27 0.33 None 0.11 0.01

<svg xmlns=”http://www.w3.org/2000/svg” height=”24px”viewBox=”0 0 24 24″
width=”24px”> </svg>

2it [00:00,  5.45it/s]
100%|██████████| 38/38 [00:14<00:00,  2.63it/s]
2it [00:00,  9.26it/s]
100%|██████████| 38/38 [00:14<00:00,  2.58it/s]


Elapsed: 29.76035761833191 seconds
Accuracy Balanced Accuracy ROC AUC F1 Score Time Taken
Model
RandomForestClassifier 1.00 1.00 None 1.00 0.30
GenericBooster(ExtraTreeRegressor) 1.00 1.00 None 1.00 0.17
GenericBooster(TransformedTargetRegressor) 1.00 1.00 None 1.00 0.26
GenericBooster(RidgeCV) 1.00 1.00 None 1.00 0.23
GenericBooster(Ridge) 1.00 1.00 None 1.00 0.15
GenericBooster(LinearRegression) 1.00 1.00 None 1.00 0.15
XGBClassifier 0.97 0.96 None 0.97 0.06
GenericBooster(MultiTask(SGDRegressor)) 0.97 0.98 None 0.97 1.10
GenericBooster(MultiTask(PassiveAggressiveRegressor)) 0.97 0.98 None 0.97 1.18
GenericBooster(MultiTask(LinearSVR)) 0.97 0.98 None 0.97 3.71
GenericBooster(MultiTask(BayesianRidge)) 0.97 0.98 None 0.97 1.86
GenericBooster(MultiTask(TweedieRegressor)) 0.97 0.98 None 0.97 1.39
GenericBooster(Lars) 0.94 0.94 None 0.95 0.93
GenericBooster(KNeighborsRegressor) 0.92 0.93 None 0.92 0.19
GenericBooster(DecisionTreeRegressor) 0.92 0.92 None 0.92 0.22
GenericBooster(MultiTaskElasticNet) 0.69 0.61 None 0.61 0.03
GenericBooster(ElasticNet) 0.61 0.53 None 0.53 0.05
GenericBooster(MultiTaskLasso) 0.42 0.33 None 0.25 0.01
GenericBooster(LassoLars) 0.42 0.33 None 0.25 0.01
GenericBooster(Lasso) 0.42 0.33 None 0.25 0.01
GenericBooster(DummyRegressor) 0.42 0.33 None 0.25 0.01
GenericBooster(MultiTask(QuantileRegressor)) 0.25 0.33 None 0.10 2.73

<svg xmlns=”http://www.w3.org/2000/svg” height=”24px”viewBox=”0 0 24 24″
width=”24px”> </svg>

Accuracy Balanced Accuracy ROC AUC F1 Score Time Taken
Model
RandomForestClassifier 1.00 1.00 None 1.00 0.15
GenericBooster(ExtraTreeRegressor) 1.00 1.00 None 1.00 0.16
GenericBooster(TransformedTargetRegressor) 1.00 1.00 None 1.00 0.24
GenericBooster(RidgeCV) 1.00 1.00 None 1.00 0.22
GenericBooster(Ridge) 1.00 1.00 None 1.00 0.16
GenericBooster(LinearRegression) 1.00 1.00 None 1.00 0.15
XGBClassifier 0.97 0.96 None 0.97 0.06
GenericBooster(MultiTask(SGDRegressor)) 0.97 0.98 None 0.97 0.84
GenericBooster(MultiTask(PassiveAggressiveRegressor)) 0.97 0.98 None 0.97 1.18
GenericBooster(MultiTask(LinearSVR)) 0.97 0.98 None 0.97 3.41
GenericBooster(MultiTask(BayesianRidge)) 0.97 0.98 None 0.97 2.15
GenericBooster(MultiTask(TweedieRegressor)) 0.97 0.98 None 0.97 1.91
GenericBooster(Lars) 0.94 0.94 None 0.95 0.93
GenericBooster(KNeighborsRegressor) 0.92 0.93 None 0.92 0.20
GenericBooster(DecisionTreeRegressor) 0.92 0.92 None 0.92 0.23
GenericBooster(MultiTaskElasticNet) 0.69 0.61 None 0.61 0.03
GenericBooster(ElasticNet) 0.61 0.53 None 0.53 0.04
GenericBooster(MultiTaskLasso) 0.42 0.33 None 0.25 0.01
GenericBooster(LassoLars) 0.42 0.33 None 0.25 0.01
GenericBooster(Lasso) 0.42 0.33 None 0.25 0.01
GenericBooster(DummyRegressor) 0.42 0.33 None 0.25 0.01
GenericBooster(MultiTask(QuantileRegressor)) 0.25 0.33 None 0.10 2.78

<svg xmlns=”http://www.w3.org/2000/svg” height=”24px”viewBox=”0 0 24 24″
width=”24px”> </svg>

2it [00:01,  1.90it/s]
100%|██████████| 38/38 [09:30<00:00, 15.02s/it]
2it [00:01,  1.03it/s]
100%|██████████| 38/38 [09:27<00:00, 14.94s/it]


Elapsed: 1141.7054164409637 seconds
Accuracy Balanced Accuracy ROC AUC F1 Score Time Taken
Model
RandomForestClassifier 0.97 0.97 None 0.97 0.56
XGBClassifier 0.97 0.97 None 0.97 0.50
GenericBooster(ExtraTreeRegressor) 0.96 0.96 None 0.96 1.75
GenericBooster(KNeighborsRegressor) 0.95 0.95 None 0.95 4.34
GenericBooster(LinearRegression) 0.94 0.94 None 0.94 4.47
GenericBooster(MultiTask(BayesianRidge)) 0.94 0.94 None 0.94 51.97
GenericBooster(TransformedTargetRegressor) 0.94 0.94 None 0.94 2.54
GenericBooster(RidgeCV) 0.94 0.94 None 0.94 4.55
GenericBooster(Ridge) 0.94 0.94 None 0.94 0.63
GenericBooster(MultiTask(TweedieRegressor)) 0.93 0.93 None 0.93 13.86
GenericBooster(DecisionTreeRegressor) 0.88 0.88 None 0.88 6.14
GenericBooster(MultiTask(PassiveAggressiveRegressor)) 0.79 0.79 None 0.80 13.46
GenericBooster(MultiTask(LinearSVR)) 0.37 0.39 None 0.26 297.07
GenericBooster(Lars) 0.20 0.20 None 0.21 19.23
GenericBooster(MultiTask(QuantileRegressor)) 0.12 0.10 None 0.03 140.91
GenericBooster(MultiTask(SGDRegressor)) 0.10 0.10 None 0.06 9.46
GenericBooster(LassoLars) 0.07 0.10 None 0.01 0.05
GenericBooster(Lasso) 0.07 0.10 None 0.01 0.07
GenericBooster(MultiTaskLasso) 0.07 0.10 None 0.01 0.04
GenericBooster(ElasticNet) 0.07 0.10 None 0.01 0.03
GenericBooster(DummyRegressor) 0.07 0.10 None 0.01 0.02
GenericBooster(MultiTaskElasticNet) 0.07 0.10 None 0.01 0.05

<svg xmlns=”http://www.w3.org/2000/svg” height=”24px”viewBox=”0 0 24 24″
width=”24px”> </svg>

Accuracy Balanced Accuracy ROC AUC F1 Score Time Taken
Model
RandomForestClassifier 0.97 0.97 None 0.97 0.67
XGBClassifier 0.97 0.97 None 0.97 1.27
GenericBooster(ExtraTreeRegressor) 0.96 0.96 None 0.96 1.69
GenericBooster(KNeighborsRegressor) 0.95 0.95 None 0.95 4.76
GenericBooster(LinearRegression) 0.94 0.94 None 0.94 2.01
GenericBooster(MultiTask(BayesianRidge)) 0.94 0.94 None 0.94 46.87
GenericBooster(TransformedTargetRegressor) 0.94 0.94 None 0.94 5.40
GenericBooster(RidgeCV) 0.94 0.94 None 0.94 3.93
GenericBooster(Ridge) 0.94 0.94 None 0.94 0.60
GenericBooster(MultiTask(TweedieRegressor)) 0.93 0.93 None 0.93 14.96
GenericBooster(DecisionTreeRegressor) 0.88 0.88 None 0.88 4.12
GenericBooster(MultiTask(PassiveAggressiveRegressor)) 0.79 0.79 None 0.80 12.68
GenericBooster(MultiTask(LinearSVR)) 0.37 0.39 None 0.26 294.88
GenericBooster(Lars) 0.20 0.20 None 0.21 19.40
GenericBooster(MultiTask(QuantileRegressor)) 0.12 0.10 None 0.03 145.91
GenericBooster(MultiTask(SGDRegressor)) 0.10 0.10 None 0.06 10.30
GenericBooster(LassoLars) 0.07 0.10 None 0.01 0.02
GenericBooster(Lasso) 0.07 0.10 None 0.01 0.03
GenericBooster(MultiTaskLasso) 0.07 0.10 None 0.01 0.03
GenericBooster(ElasticNet) 0.07 0.10 None 0.01 0.03
GenericBooster(DummyRegressor) 0.07 0.10 None 0.01 0.02
GenericBooster(MultiTaskElasticNet) 0.07 0.10 None 0.01 0.03

<svg xmlns=”http://www.w3.org/2000/svg” height=”24px”viewBox=”0 0 24 24″
width=”24px”> </svg>

To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - Python .

Want to share your content on python-bloggers? click here.
Exit mobile version