Gradient-Boosting anything (alert: high performance): Part3, Histogram-based boosting

This article was first published on T. Moudiki's Webpage - Python , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

A few weeks ago, I intoduced a model-agnostic gradient boosting procedure, that can use any base learner (available in R and Python package mlsauce):

The rationale is different from other histogram-based gradient boosting algorithms, as histograms are only used here for feature engineering of continuous features. So far, I don’t see huge differences with the original implementation of the GenericBooster, but it’s still a work in progress. I envisage to try it out on a data set that contains a ‘higher’ mix of continuous and categorical features (as categorical features are not histogram-engineered).

Here are a few results that can give you an idea of the performance of the algorithm:

!pip install git+https://github.com/Techtonique/mlsauce.git --verbose --upgrade --no-cache-dir
import os
import mlsauce as ms
from sklearn.datasets import load_breast_cancer, load_iris, load_wine, load_digits
from sklearn.model_selection import train_test_split
from time import time

load_models = [load_breast_cancer, load_iris, load_wine, load_digits]

for model in load_models:

    data = model()
    X = data.data
    y= data.target

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .2, random_state = 13)

    clf = ms.LazyBoostingClassifier(verbose=0, ignore_warnings=True, #n_jobs=2,
                                    custom_metric=None, preprocess=False)

    start = time()
    models, predictioms = clf.fit(X_train, X_test, y_train, y_test, hist=True)
    models2, predictioms = clf.fit(X_train, X_test, y_train, y_test, hist=False)
    print(f"\nElapsed: {time() - start} seconds\n")

    display(models)
    display(models2)
2it [00:00,  2.27it/s]
100%|██████████| 38/38 [00:41<00:00,  1.09s/it]
2it [00:00,  5.14it/s]
100%|██████████| 38/38 [00:43<00:00,  1.14s/it]


Elapsed: 85.95083284378052 seconds
AccuracyBalanced AccuracyROC AUCF1 ScoreTime Taken
Model
GenericBooster(MultiTask(TweedieRegressor))0.990.990.990.991.73
GenericBooster(LinearRegression)0.990.990.990.990.37
GenericBooster(TransformedTargetRegressor)0.990.990.990.990.40
GenericBooster(RidgeCV)0.990.990.990.991.28
GenericBooster(Ridge)0.990.990.990.990.27
XGBClassifier0.960.960.960.960.50
RandomForestClassifier0.960.960.960.960.37
GenericBooster(ExtraTreeRegressor)0.940.940.940.940.40
GenericBooster(MultiTask(BayesianRidge))0.940.930.930.944.97
GenericBooster(KNeighborsRegressor)0.870.890.890.870.70
GenericBooster(DecisionTreeRegressor)0.870.880.880.872.24
GenericBooster(MultiTaskElasticNet)0.870.790.790.860.11
GenericBooster(MultiTask(PassiveAggressiveRegressor))0.860.790.790.851.28
GenericBooster(MultiTaskLasso)0.850.760.760.840.06
GenericBooster(ElasticNet)0.850.760.760.840.16
GenericBooster(MultiTask(QuantileRegressor))0.820.720.720.8010.42
GenericBooster(Lasso)0.820.710.710.790.09
GenericBooster(LassoLars)0.820.710.710.790.08
GenericBooster(MultiTask(LinearSVR))0.810.690.690.7814.75
GenericBooster(DummyRegressor)0.680.500.500.560.01
GenericBooster(MultiTask(SGDRegressor))0.500.460.460.511.67


AccuracyBalanced AccuracyROC AUCF1 ScoreTime Taken
Model
GenericBooster(MultiTask(TweedieRegressor))0.990.990.990.991.67
GenericBooster(LinearRegression)0.990.990.990.990.30
GenericBooster(TransformedTargetRegressor)0.990.990.990.990.74
GenericBooster(RidgeCV)0.990.990.990.992.77
GenericBooster(Ridge)0.990.990.990.990.28
XGBClassifier0.960.960.960.960.13
GenericBooster(MultiTask(BayesianRidge))0.940.930.930.947.81
GenericBooster(ExtraTreeRegressor)0.940.940.940.940.23
RandomForestClassifier0.920.930.930.920.25
GenericBooster(KNeighborsRegressor)0.870.890.890.870.42
GenericBooster(DecisionTreeRegressor)0.870.880.880.870.97
GenericBooster(MultiTaskElasticNet)0.870.790.790.860.11
GenericBooster(MultiTask(PassiveAggressiveRegressor))0.860.790.790.851.20
GenericBooster(MultiTaskLasso)0.850.760.760.840.06
GenericBooster(ElasticNet)0.850.760.760.840.09
GenericBooster(MultiTask(QuantileRegressor))0.820.720.720.8010.57
GenericBooster(LassoLars)0.820.710.710.790.09
GenericBooster(Lasso)0.820.710.710.790.09
GenericBooster(MultiTask(LinearSVR))0.810.690.690.7814.20
GenericBooster(DummyRegressor)0.680.500.500.560.01
GenericBooster(MultiTask(SGDRegressor))0.500.460.460.511.33


2it [00:00,  6.46it/s]
100%|██████████| 38/38 [00:12<00:00,  3.11it/s]
2it [00:00, 10.38it/s]
100%|██████████| 38/38 [00:11<00:00,  3.18it/s]


Elapsed: 24.71835470199585 seconds
AccuracyBalanced AccuracyROC AUCF1 ScoreTime Taken
Model
GenericBooster(RidgeCV)1.001.00None1.000.18
GenericBooster(Ridge)1.001.00None1.000.14
GenericBooster(LinearRegression)0.970.97None0.970.13
GenericBooster(DecisionTreeRegressor)0.970.97None0.970.18
GenericBooster(TransformedTargetRegressor)0.970.97None0.970.23
GenericBooster(ExtraTreeRegressor)0.970.97None0.970.14
XGBClassifier0.970.97None0.970.05
RandomForestClassifier0.930.95None0.930.26
GenericBooster(KNeighborsRegressor)0.930.95None0.930.27
GenericBooster(MultiTask(SGDRegressor))0.900.92None0.900.75
GenericBooster(MultiTask(TweedieRegressor))0.900.92None0.901.61
GenericBooster(MultiTask(LinearSVR))0.800.85None0.802.15
GenericBooster(MultiTaskElasticNet)0.800.85None0.800.07
GenericBooster(MultiTask(BayesianRidge))0.630.72None0.572.42
GenericBooster(MultiTask(PassiveAggressiveRegressor))0.570.67None0.451.05
GenericBooster(Lars)0.500.46None0.480.59
GenericBooster(MultiTask(QuantileRegressor))0.430.33None0.262.19
GenericBooster(LassoLars)0.270.33None0.110.01
GenericBooster(MultiTaskLasso)0.270.33None0.110.01
GenericBooster(Lasso)0.270.33None0.110.01
GenericBooster(ElasticNet)0.270.33None0.110.01
GenericBooster(DummyRegressor)0.270.33None0.110.01


AccuracyBalanced AccuracyROC AUCF1 ScoreTime Taken
Model
GenericBooster(RidgeCV)1.001.00None1.000.16
GenericBooster(Ridge)1.001.00None1.000.16
RandomForestClassifier0.970.97None0.970.15
GenericBooster(LinearRegression)0.970.97None0.970.13
GenericBooster(DecisionTreeRegressor)0.970.97None0.970.16
GenericBooster(TransformedTargetRegressor)0.970.97None0.970.24
GenericBooster(ExtraTreeRegressor)0.970.97None0.970.14
XGBClassifier0.970.97None0.970.04
GenericBooster(KNeighborsRegressor)0.930.95None0.930.28
GenericBooster(MultiTask(SGDRegressor))0.900.92None0.900.78
GenericBooster(MultiTask(TweedieRegressor))0.900.92None0.901.35
GenericBooster(MultiTask(LinearSVR))0.800.85None0.802.15
GenericBooster(MultiTaskElasticNet)0.800.85None0.800.07
GenericBooster(MultiTask(BayesianRidge))0.630.72None0.571.81
GenericBooster(MultiTask(PassiveAggressiveRegressor))0.570.67None0.451.21
GenericBooster(Lars)0.500.46None0.480.58
GenericBooster(MultiTask(QuantileRegressor))0.430.33None0.262.63
GenericBooster(LassoLars)0.270.33None0.110.01
GenericBooster(MultiTaskLasso)0.270.33None0.110.01
GenericBooster(Lasso)0.270.33None0.110.01
GenericBooster(ElasticNet)0.270.33None0.110.02
GenericBooster(DummyRegressor)0.270.33None0.110.01


2it [00:00,  5.45it/s]
100%|██████████| 38/38 [00:14<00:00,  2.63it/s]
2it [00:00,  9.26it/s]
100%|██████████| 38/38 [00:14<00:00,  2.58it/s]


Elapsed: 29.76035761833191 seconds
AccuracyBalanced AccuracyROC AUCF1 ScoreTime Taken
Model
RandomForestClassifier1.001.00None1.000.30
GenericBooster(ExtraTreeRegressor)1.001.00None1.000.17
GenericBooster(TransformedTargetRegressor)1.001.00None1.000.26
GenericBooster(RidgeCV)1.001.00None1.000.23
GenericBooster(Ridge)1.001.00None1.000.15
GenericBooster(LinearRegression)1.001.00None1.000.15
XGBClassifier0.970.96None0.970.06
GenericBooster(MultiTask(SGDRegressor))0.970.98None0.971.10
GenericBooster(MultiTask(PassiveAggressiveRegressor))0.970.98None0.971.18
GenericBooster(MultiTask(LinearSVR))0.970.98None0.973.71
GenericBooster(MultiTask(BayesianRidge))0.970.98None0.971.86
GenericBooster(MultiTask(TweedieRegressor))0.970.98None0.971.39
GenericBooster(Lars)0.940.94None0.950.93
GenericBooster(KNeighborsRegressor)0.920.93None0.920.19
GenericBooster(DecisionTreeRegressor)0.920.92None0.920.22
GenericBooster(MultiTaskElasticNet)0.690.61None0.610.03
GenericBooster(ElasticNet)0.610.53None0.530.05
GenericBooster(MultiTaskLasso)0.420.33None0.250.01
GenericBooster(LassoLars)0.420.33None0.250.01
GenericBooster(Lasso)0.420.33None0.250.01
GenericBooster(DummyRegressor)0.420.33None0.250.01
GenericBooster(MultiTask(QuantileRegressor))0.250.33None0.102.73


AccuracyBalanced AccuracyROC AUCF1 ScoreTime Taken
Model
RandomForestClassifier1.001.00None1.000.15
GenericBooster(ExtraTreeRegressor)1.001.00None1.000.16
GenericBooster(TransformedTargetRegressor)1.001.00None1.000.24
GenericBooster(RidgeCV)1.001.00None1.000.22
GenericBooster(Ridge)1.001.00None1.000.16
GenericBooster(LinearRegression)1.001.00None1.000.15
XGBClassifier0.970.96None0.970.06
GenericBooster(MultiTask(SGDRegressor))0.970.98None0.970.84
GenericBooster(MultiTask(PassiveAggressiveRegressor))0.970.98None0.971.18
GenericBooster(MultiTask(LinearSVR))0.970.98None0.973.41
GenericBooster(MultiTask(BayesianRidge))0.970.98None0.972.15
GenericBooster(MultiTask(TweedieRegressor))0.970.98None0.971.91
GenericBooster(Lars)0.940.94None0.950.93
GenericBooster(KNeighborsRegressor)0.920.93None0.920.20
GenericBooster(DecisionTreeRegressor)0.920.92None0.920.23
GenericBooster(MultiTaskElasticNet)0.690.61None0.610.03
GenericBooster(ElasticNet)0.610.53None0.530.04
GenericBooster(MultiTaskLasso)0.420.33None0.250.01
GenericBooster(LassoLars)0.420.33None0.250.01
GenericBooster(Lasso)0.420.33None0.250.01
GenericBooster(DummyRegressor)0.420.33None0.250.01
GenericBooster(MultiTask(QuantileRegressor))0.250.33None0.102.78


2it [00:01,  1.90it/s]
100%|██████████| 38/38 [09:30<00:00, 15.02s/it]
2it [00:01,  1.03it/s]
100%|██████████| 38/38 [09:27<00:00, 14.94s/it]


Elapsed: 1141.7054164409637 seconds
AccuracyBalanced AccuracyROC AUCF1 ScoreTime Taken
Model
RandomForestClassifier0.970.97None0.970.56
XGBClassifier0.970.97None0.970.50
GenericBooster(ExtraTreeRegressor)0.960.96None0.961.75
GenericBooster(KNeighborsRegressor)0.950.95None0.954.34
GenericBooster(LinearRegression)0.940.94None0.944.47
GenericBooster(MultiTask(BayesianRidge))0.940.94None0.9451.97
GenericBooster(TransformedTargetRegressor)0.940.94None0.942.54
GenericBooster(RidgeCV)0.940.94None0.944.55
GenericBooster(Ridge)0.940.94None0.940.63
GenericBooster(MultiTask(TweedieRegressor))0.930.93None0.9313.86
GenericBooster(DecisionTreeRegressor)0.880.88None0.886.14
GenericBooster(MultiTask(PassiveAggressiveRegressor))0.790.79None0.8013.46
GenericBooster(MultiTask(LinearSVR))0.370.39None0.26297.07
GenericBooster(Lars)0.200.20None0.2119.23
GenericBooster(MultiTask(QuantileRegressor))0.120.10None0.03140.91
GenericBooster(MultiTask(SGDRegressor))0.100.10None0.069.46
GenericBooster(LassoLars)0.070.10None0.010.05
GenericBooster(Lasso)0.070.10None0.010.07
GenericBooster(MultiTaskLasso)0.070.10None0.010.04
GenericBooster(ElasticNet)0.070.10None0.010.03
GenericBooster(DummyRegressor)0.070.10None0.010.02
GenericBooster(MultiTaskElasticNet)0.070.10None0.010.05


AccuracyBalanced AccuracyROC AUCF1 ScoreTime Taken
Model
RandomForestClassifier0.970.97None0.970.67
XGBClassifier0.970.97None0.971.27
GenericBooster(ExtraTreeRegressor)0.960.96None0.961.69
GenericBooster(KNeighborsRegressor)0.950.95None0.954.76
GenericBooster(LinearRegression)0.940.94None0.942.01
GenericBooster(MultiTask(BayesianRidge))0.940.94None0.9446.87
GenericBooster(TransformedTargetRegressor)0.940.94None0.945.40
GenericBooster(RidgeCV)0.940.94None0.943.93
GenericBooster(Ridge)0.940.94None0.940.60
GenericBooster(MultiTask(TweedieRegressor))0.930.93None0.9314.96
GenericBooster(DecisionTreeRegressor)0.880.88None0.884.12
GenericBooster(MultiTask(PassiveAggressiveRegressor))0.790.79None0.8012.68
GenericBooster(MultiTask(LinearSVR))0.370.39None0.26294.88
GenericBooster(Lars)0.200.20None0.2119.40
GenericBooster(MultiTask(QuantileRegressor))0.120.10None0.03145.91
GenericBooster(MultiTask(SGDRegressor))0.100.10None0.0610.30
GenericBooster(LassoLars)0.070.10None0.010.02
GenericBooster(Lasso)0.070.10None0.010.03
GenericBooster(MultiTaskLasso)0.070.10None0.010.03
GenericBooster(ElasticNet)0.070.10None0.010.03
GenericBooster(DummyRegressor)0.070.10None0.010.02
GenericBooster(MultiTaskElasticNet)0.070.10None0.010.03


xxx

To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - Python .

Want to share your content on python-bloggers? click here.