Gradient-Boosting anything (alert: high performance): Part3, Histogram-based boosting
This article was first published on T. Moudiki's Webpage - Python , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.
Want to share your content on python-bloggers? click here.
A few weeks ago, I intoduced a model-agnostic gradient boosting procedure, that can use any base learner (available in R and Python package mlsauce):
- https://thierrymoudiki.github.io/blog/2024/10/06/python/r/genericboosting
- https://thierrymoudiki.github.io/blog/2024/10/14/r/genericboosting-r
The rationale is different from other histogram-based gradient boosting algorithms, as histograms are only used here for feature engineering of continuous features. So far, I don’t see huge differences with the original implementation of the GenericBooster, but it’s still a work in progress. I envisage to try it out on a data set that contains a ‘higher’ mix of continuous and categorical features (as categorical features are not histogram-engineered).
Here are a few results that can give you an idea of the performance of the algorithm:
!pip install git+https://github.com/Techtonique/mlsauce.git --verbose --upgrade --no-cache-dir
import os
import mlsauce as ms
from sklearn.datasets import load_breast_cancer, load_iris, load_wine, load_digits
from sklearn.model_selection import train_test_split
from time import time
load_models = [load_breast_cancer, load_iris, load_wine, load_digits]
for model in load_models:
data = model()
X = data.data
y= data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .2, random_state = 13)
clf = ms.LazyBoostingClassifier(verbose=0, ignore_warnings=True, #n_jobs=2,
custom_metric=None, preprocess=False)
start = time()
models, predictioms = clf.fit(X_train, X_test, y_train, y_test, hist=True)
models2, predictioms = clf.fit(X_train, X_test, y_train, y_test, hist=False)
print(f"\nElapsed: {time() - start} seconds\n")
display(models)
display(models2)
2it [00:00, 2.27it/s] 100%|██████████| 38/38 [00:41<00:00, 1.09s/it] 2it [00:00, 5.14it/s] 100%|██████████| 38/38 [00:43<00:00, 1.14s/it] Elapsed: 85.95083284378052 seconds
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken | |
|---|---|---|---|---|---|
| Model | |||||
| GenericBooster(MultiTask(TweedieRegressor)) | 0.99 | 0.99 | 0.99 | 0.99 | 1.73 |
| GenericBooster(LinearRegression) | 0.99 | 0.99 | 0.99 | 0.99 | 0.37 |
| GenericBooster(TransformedTargetRegressor) | 0.99 | 0.99 | 0.99 | 0.99 | 0.40 |
| GenericBooster(RidgeCV) | 0.99 | 0.99 | 0.99 | 0.99 | 1.28 |
| GenericBooster(Ridge) | 0.99 | 0.99 | 0.99 | 0.99 | 0.27 |
| XGBClassifier | 0.96 | 0.96 | 0.96 | 0.96 | 0.50 |
| RandomForestClassifier | 0.96 | 0.96 | 0.96 | 0.96 | 0.37 |
| GenericBooster(ExtraTreeRegressor) | 0.94 | 0.94 | 0.94 | 0.94 | 0.40 |
| GenericBooster(MultiTask(BayesianRidge)) | 0.94 | 0.93 | 0.93 | 0.94 | 4.97 |
| GenericBooster(KNeighborsRegressor) | 0.87 | 0.89 | 0.89 | 0.87 | 0.70 |
| GenericBooster(DecisionTreeRegressor) | 0.87 | 0.88 | 0.88 | 0.87 | 2.24 |
| GenericBooster(MultiTaskElasticNet) | 0.87 | 0.79 | 0.79 | 0.86 | 0.11 |
| GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.86 | 0.79 | 0.79 | 0.85 | 1.28 |
| GenericBooster(MultiTaskLasso) | 0.85 | 0.76 | 0.76 | 0.84 | 0.06 |
| GenericBooster(ElasticNet) | 0.85 | 0.76 | 0.76 | 0.84 | 0.16 |
| GenericBooster(MultiTask(QuantileRegressor)) | 0.82 | 0.72 | 0.72 | 0.80 | 10.42 |
| GenericBooster(Lasso) | 0.82 | 0.71 | 0.71 | 0.79 | 0.09 |
| GenericBooster(LassoLars) | 0.82 | 0.71 | 0.71 | 0.79 | 0.08 |
| GenericBooster(MultiTask(LinearSVR)) | 0.81 | 0.69 | 0.69 | 0.78 | 14.75 |
| GenericBooster(DummyRegressor) | 0.68 | 0.50 | 0.50 | 0.56 | 0.01 |
| GenericBooster(MultiTask(SGDRegressor)) | 0.50 | 0.46 | 0.46 | 0.51 | 1.67 |
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken | |
|---|---|---|---|---|---|
| Model | |||||
| GenericBooster(MultiTask(TweedieRegressor)) | 0.99 | 0.99 | 0.99 | 0.99 | 1.67 |
| GenericBooster(LinearRegression) | 0.99 | 0.99 | 0.99 | 0.99 | 0.30 |
| GenericBooster(TransformedTargetRegressor) | 0.99 | 0.99 | 0.99 | 0.99 | 0.74 |
| GenericBooster(RidgeCV) | 0.99 | 0.99 | 0.99 | 0.99 | 2.77 |
| GenericBooster(Ridge) | 0.99 | 0.99 | 0.99 | 0.99 | 0.28 |
| XGBClassifier | 0.96 | 0.96 | 0.96 | 0.96 | 0.13 |
| GenericBooster(MultiTask(BayesianRidge)) | 0.94 | 0.93 | 0.93 | 0.94 | 7.81 |
| GenericBooster(ExtraTreeRegressor) | 0.94 | 0.94 | 0.94 | 0.94 | 0.23 |
| RandomForestClassifier | 0.92 | 0.93 | 0.93 | 0.92 | 0.25 |
| GenericBooster(KNeighborsRegressor) | 0.87 | 0.89 | 0.89 | 0.87 | 0.42 |
| GenericBooster(DecisionTreeRegressor) | 0.87 | 0.88 | 0.88 | 0.87 | 0.97 |
| GenericBooster(MultiTaskElasticNet) | 0.87 | 0.79 | 0.79 | 0.86 | 0.11 |
| GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.86 | 0.79 | 0.79 | 0.85 | 1.20 |
| GenericBooster(MultiTaskLasso) | 0.85 | 0.76 | 0.76 | 0.84 | 0.06 |
| GenericBooster(ElasticNet) | 0.85 | 0.76 | 0.76 | 0.84 | 0.09 |
| GenericBooster(MultiTask(QuantileRegressor)) | 0.82 | 0.72 | 0.72 | 0.80 | 10.57 |
| GenericBooster(LassoLars) | 0.82 | 0.71 | 0.71 | 0.79 | 0.09 |
| GenericBooster(Lasso) | 0.82 | 0.71 | 0.71 | 0.79 | 0.09 |
| GenericBooster(MultiTask(LinearSVR)) | 0.81 | 0.69 | 0.69 | 0.78 | 14.20 |
| GenericBooster(DummyRegressor) | 0.68 | 0.50 | 0.50 | 0.56 | 0.01 |
| GenericBooster(MultiTask(SGDRegressor)) | 0.50 | 0.46 | 0.46 | 0.51 | 1.33 |
2it [00:00, 6.46it/s] 100%|██████████| 38/38 [00:12<00:00, 3.11it/s] 2it [00:00, 10.38it/s] 100%|██████████| 38/38 [00:11<00:00, 3.18it/s] Elapsed: 24.71835470199585 seconds
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken | |
|---|---|---|---|---|---|
| Model | |||||
| GenericBooster(RidgeCV) | 1.00 | 1.00 | None | 1.00 | 0.18 |
| GenericBooster(Ridge) | 1.00 | 1.00 | None | 1.00 | 0.14 |
| GenericBooster(LinearRegression) | 0.97 | 0.97 | None | 0.97 | 0.13 |
| GenericBooster(DecisionTreeRegressor) | 0.97 | 0.97 | None | 0.97 | 0.18 |
| GenericBooster(TransformedTargetRegressor) | 0.97 | 0.97 | None | 0.97 | 0.23 |
| GenericBooster(ExtraTreeRegressor) | 0.97 | 0.97 | None | 0.97 | 0.14 |
| XGBClassifier | 0.97 | 0.97 | None | 0.97 | 0.05 |
| RandomForestClassifier | 0.93 | 0.95 | None | 0.93 | 0.26 |
| GenericBooster(KNeighborsRegressor) | 0.93 | 0.95 | None | 0.93 | 0.27 |
| GenericBooster(MultiTask(SGDRegressor)) | 0.90 | 0.92 | None | 0.90 | 0.75 |
| GenericBooster(MultiTask(TweedieRegressor)) | 0.90 | 0.92 | None | 0.90 | 1.61 |
| GenericBooster(MultiTask(LinearSVR)) | 0.80 | 0.85 | None | 0.80 | 2.15 |
| GenericBooster(MultiTaskElasticNet) | 0.80 | 0.85 | None | 0.80 | 0.07 |
| GenericBooster(MultiTask(BayesianRidge)) | 0.63 | 0.72 | None | 0.57 | 2.42 |
| GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.57 | 0.67 | None | 0.45 | 1.05 |
| GenericBooster(Lars) | 0.50 | 0.46 | None | 0.48 | 0.59 |
| GenericBooster(MultiTask(QuantileRegressor)) | 0.43 | 0.33 | None | 0.26 | 2.19 |
| GenericBooster(LassoLars) | 0.27 | 0.33 | None | 0.11 | 0.01 |
| GenericBooster(MultiTaskLasso) | 0.27 | 0.33 | None | 0.11 | 0.01 |
| GenericBooster(Lasso) | 0.27 | 0.33 | None | 0.11 | 0.01 |
| GenericBooster(ElasticNet) | 0.27 | 0.33 | None | 0.11 | 0.01 |
| GenericBooster(DummyRegressor) | 0.27 | 0.33 | None | 0.11 | 0.01 |
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken | |
|---|---|---|---|---|---|
| Model | |||||
| GenericBooster(RidgeCV) | 1.00 | 1.00 | None | 1.00 | 0.16 |
| GenericBooster(Ridge) | 1.00 | 1.00 | None | 1.00 | 0.16 |
| RandomForestClassifier | 0.97 | 0.97 | None | 0.97 | 0.15 |
| GenericBooster(LinearRegression) | 0.97 | 0.97 | None | 0.97 | 0.13 |
| GenericBooster(DecisionTreeRegressor) | 0.97 | 0.97 | None | 0.97 | 0.16 |
| GenericBooster(TransformedTargetRegressor) | 0.97 | 0.97 | None | 0.97 | 0.24 |
| GenericBooster(ExtraTreeRegressor) | 0.97 | 0.97 | None | 0.97 | 0.14 |
| XGBClassifier | 0.97 | 0.97 | None | 0.97 | 0.04 |
| GenericBooster(KNeighborsRegressor) | 0.93 | 0.95 | None | 0.93 | 0.28 |
| GenericBooster(MultiTask(SGDRegressor)) | 0.90 | 0.92 | None | 0.90 | 0.78 |
| GenericBooster(MultiTask(TweedieRegressor)) | 0.90 | 0.92 | None | 0.90 | 1.35 |
| GenericBooster(MultiTask(LinearSVR)) | 0.80 | 0.85 | None | 0.80 | 2.15 |
| GenericBooster(MultiTaskElasticNet) | 0.80 | 0.85 | None | 0.80 | 0.07 |
| GenericBooster(MultiTask(BayesianRidge)) | 0.63 | 0.72 | None | 0.57 | 1.81 |
| GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.57 | 0.67 | None | 0.45 | 1.21 |
| GenericBooster(Lars) | 0.50 | 0.46 | None | 0.48 | 0.58 |
| GenericBooster(MultiTask(QuantileRegressor)) | 0.43 | 0.33 | None | 0.26 | 2.63 |
| GenericBooster(LassoLars) | 0.27 | 0.33 | None | 0.11 | 0.01 |
| GenericBooster(MultiTaskLasso) | 0.27 | 0.33 | None | 0.11 | 0.01 |
| GenericBooster(Lasso) | 0.27 | 0.33 | None | 0.11 | 0.01 |
| GenericBooster(ElasticNet) | 0.27 | 0.33 | None | 0.11 | 0.02 |
| GenericBooster(DummyRegressor) | 0.27 | 0.33 | None | 0.11 | 0.01 |
2it [00:00, 5.45it/s] 100%|██████████| 38/38 [00:14<00:00, 2.63it/s] 2it [00:00, 9.26it/s] 100%|██████████| 38/38 [00:14<00:00, 2.58it/s] Elapsed: 29.76035761833191 seconds
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken | |
|---|---|---|---|---|---|
| Model | |||||
| RandomForestClassifier | 1.00 | 1.00 | None | 1.00 | 0.30 |
| GenericBooster(ExtraTreeRegressor) | 1.00 | 1.00 | None | 1.00 | 0.17 |
| GenericBooster(TransformedTargetRegressor) | 1.00 | 1.00 | None | 1.00 | 0.26 |
| GenericBooster(RidgeCV) | 1.00 | 1.00 | None | 1.00 | 0.23 |
| GenericBooster(Ridge) | 1.00 | 1.00 | None | 1.00 | 0.15 |
| GenericBooster(LinearRegression) | 1.00 | 1.00 | None | 1.00 | 0.15 |
| XGBClassifier | 0.97 | 0.96 | None | 0.97 | 0.06 |
| GenericBooster(MultiTask(SGDRegressor)) | 0.97 | 0.98 | None | 0.97 | 1.10 |
| GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.97 | 0.98 | None | 0.97 | 1.18 |
| GenericBooster(MultiTask(LinearSVR)) | 0.97 | 0.98 | None | 0.97 | 3.71 |
| GenericBooster(MultiTask(BayesianRidge)) | 0.97 | 0.98 | None | 0.97 | 1.86 |
| GenericBooster(MultiTask(TweedieRegressor)) | 0.97 | 0.98 | None | 0.97 | 1.39 |
| GenericBooster(Lars) | 0.94 | 0.94 | None | 0.95 | 0.93 |
| GenericBooster(KNeighborsRegressor) | 0.92 | 0.93 | None | 0.92 | 0.19 |
| GenericBooster(DecisionTreeRegressor) | 0.92 | 0.92 | None | 0.92 | 0.22 |
| GenericBooster(MultiTaskElasticNet) | 0.69 | 0.61 | None | 0.61 | 0.03 |
| GenericBooster(ElasticNet) | 0.61 | 0.53 | None | 0.53 | 0.05 |
| GenericBooster(MultiTaskLasso) | 0.42 | 0.33 | None | 0.25 | 0.01 |
| GenericBooster(LassoLars) | 0.42 | 0.33 | None | 0.25 | 0.01 |
| GenericBooster(Lasso) | 0.42 | 0.33 | None | 0.25 | 0.01 |
| GenericBooster(DummyRegressor) | 0.42 | 0.33 | None | 0.25 | 0.01 |
| GenericBooster(MultiTask(QuantileRegressor)) | 0.25 | 0.33 | None | 0.10 | 2.73 |
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken | |
|---|---|---|---|---|---|
| Model | |||||
| RandomForestClassifier | 1.00 | 1.00 | None | 1.00 | 0.15 |
| GenericBooster(ExtraTreeRegressor) | 1.00 | 1.00 | None | 1.00 | 0.16 |
| GenericBooster(TransformedTargetRegressor) | 1.00 | 1.00 | None | 1.00 | 0.24 |
| GenericBooster(RidgeCV) | 1.00 | 1.00 | None | 1.00 | 0.22 |
| GenericBooster(Ridge) | 1.00 | 1.00 | None | 1.00 | 0.16 |
| GenericBooster(LinearRegression) | 1.00 | 1.00 | None | 1.00 | 0.15 |
| XGBClassifier | 0.97 | 0.96 | None | 0.97 | 0.06 |
| GenericBooster(MultiTask(SGDRegressor)) | 0.97 | 0.98 | None | 0.97 | 0.84 |
| GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.97 | 0.98 | None | 0.97 | 1.18 |
| GenericBooster(MultiTask(LinearSVR)) | 0.97 | 0.98 | None | 0.97 | 3.41 |
| GenericBooster(MultiTask(BayesianRidge)) | 0.97 | 0.98 | None | 0.97 | 2.15 |
| GenericBooster(MultiTask(TweedieRegressor)) | 0.97 | 0.98 | None | 0.97 | 1.91 |
| GenericBooster(Lars) | 0.94 | 0.94 | None | 0.95 | 0.93 |
| GenericBooster(KNeighborsRegressor) | 0.92 | 0.93 | None | 0.92 | 0.20 |
| GenericBooster(DecisionTreeRegressor) | 0.92 | 0.92 | None | 0.92 | 0.23 |
| GenericBooster(MultiTaskElasticNet) | 0.69 | 0.61 | None | 0.61 | 0.03 |
| GenericBooster(ElasticNet) | 0.61 | 0.53 | None | 0.53 | 0.04 |
| GenericBooster(MultiTaskLasso) | 0.42 | 0.33 | None | 0.25 | 0.01 |
| GenericBooster(LassoLars) | 0.42 | 0.33 | None | 0.25 | 0.01 |
| GenericBooster(Lasso) | 0.42 | 0.33 | None | 0.25 | 0.01 |
| GenericBooster(DummyRegressor) | 0.42 | 0.33 | None | 0.25 | 0.01 |
| GenericBooster(MultiTask(QuantileRegressor)) | 0.25 | 0.33 | None | 0.10 | 2.78 |
2it [00:01, 1.90it/s] 100%|██████████| 38/38 [09:30<00:00, 15.02s/it] 2it [00:01, 1.03it/s] 100%|██████████| 38/38 [09:27<00:00, 14.94s/it] Elapsed: 1141.7054164409637 seconds
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken | |
|---|---|---|---|---|---|
| Model | |||||
| RandomForestClassifier | 0.97 | 0.97 | None | 0.97 | 0.56 |
| XGBClassifier | 0.97 | 0.97 | None | 0.97 | 0.50 |
| GenericBooster(ExtraTreeRegressor) | 0.96 | 0.96 | None | 0.96 | 1.75 |
| GenericBooster(KNeighborsRegressor) | 0.95 | 0.95 | None | 0.95 | 4.34 |
| GenericBooster(LinearRegression) | 0.94 | 0.94 | None | 0.94 | 4.47 |
| GenericBooster(MultiTask(BayesianRidge)) | 0.94 | 0.94 | None | 0.94 | 51.97 |
| GenericBooster(TransformedTargetRegressor) | 0.94 | 0.94 | None | 0.94 | 2.54 |
| GenericBooster(RidgeCV) | 0.94 | 0.94 | None | 0.94 | 4.55 |
| GenericBooster(Ridge) | 0.94 | 0.94 | None | 0.94 | 0.63 |
| GenericBooster(MultiTask(TweedieRegressor)) | 0.93 | 0.93 | None | 0.93 | 13.86 |
| GenericBooster(DecisionTreeRegressor) | 0.88 | 0.88 | None | 0.88 | 6.14 |
| GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.79 | 0.79 | None | 0.80 | 13.46 |
| GenericBooster(MultiTask(LinearSVR)) | 0.37 | 0.39 | None | 0.26 | 297.07 |
| GenericBooster(Lars) | 0.20 | 0.20 | None | 0.21 | 19.23 |
| GenericBooster(MultiTask(QuantileRegressor)) | 0.12 | 0.10 | None | 0.03 | 140.91 |
| GenericBooster(MultiTask(SGDRegressor)) | 0.10 | 0.10 | None | 0.06 | 9.46 |
| GenericBooster(LassoLars) | 0.07 | 0.10 | None | 0.01 | 0.05 |
| GenericBooster(Lasso) | 0.07 | 0.10 | None | 0.01 | 0.07 |
| GenericBooster(MultiTaskLasso) | 0.07 | 0.10 | None | 0.01 | 0.04 |
| GenericBooster(ElasticNet) | 0.07 | 0.10 | None | 0.01 | 0.03 |
| GenericBooster(DummyRegressor) | 0.07 | 0.10 | None | 0.01 | 0.02 |
| GenericBooster(MultiTaskElasticNet) | 0.07 | 0.10 | None | 0.01 | 0.05 |
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken | |
|---|---|---|---|---|---|
| Model | |||||
| RandomForestClassifier | 0.97 | 0.97 | None | 0.97 | 0.67 |
| XGBClassifier | 0.97 | 0.97 | None | 0.97 | 1.27 |
| GenericBooster(ExtraTreeRegressor) | 0.96 | 0.96 | None | 0.96 | 1.69 |
| GenericBooster(KNeighborsRegressor) | 0.95 | 0.95 | None | 0.95 | 4.76 |
| GenericBooster(LinearRegression) | 0.94 | 0.94 | None | 0.94 | 2.01 |
| GenericBooster(MultiTask(BayesianRidge)) | 0.94 | 0.94 | None | 0.94 | 46.87 |
| GenericBooster(TransformedTargetRegressor) | 0.94 | 0.94 | None | 0.94 | 5.40 |
| GenericBooster(RidgeCV) | 0.94 | 0.94 | None | 0.94 | 3.93 |
| GenericBooster(Ridge) | 0.94 | 0.94 | None | 0.94 | 0.60 |
| GenericBooster(MultiTask(TweedieRegressor)) | 0.93 | 0.93 | None | 0.93 | 14.96 |
| GenericBooster(DecisionTreeRegressor) | 0.88 | 0.88 | None | 0.88 | 4.12 |
| GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.79 | 0.79 | None | 0.80 | 12.68 |
| GenericBooster(MultiTask(LinearSVR)) | 0.37 | 0.39 | None | 0.26 | 294.88 |
| GenericBooster(Lars) | 0.20 | 0.20 | None | 0.21 | 19.40 |
| GenericBooster(MultiTask(QuantileRegressor)) | 0.12 | 0.10 | None | 0.03 | 145.91 |
| GenericBooster(MultiTask(SGDRegressor)) | 0.10 | 0.10 | None | 0.06 | 10.30 |
| GenericBooster(LassoLars) | 0.07 | 0.10 | None | 0.01 | 0.02 |
| GenericBooster(Lasso) | 0.07 | 0.10 | None | 0.01 | 0.03 |
| GenericBooster(MultiTaskLasso) | 0.07 | 0.10 | None | 0.01 | 0.03 |
| GenericBooster(ElasticNet) | 0.07 | 0.10 | None | 0.01 | 0.03 |
| GenericBooster(DummyRegressor) | 0.07 | 0.10 | None | 0.01 | 0.02 |
| GenericBooster(MultiTaskElasticNet) | 0.07 | 0.10 | None | 0.01 | 0.03 |

To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - Python .
Want to share your content on python-bloggers? click here.