A few weeks ago, I intoduced a model-agnostic gradient boosting procedure, that can use any base learner (available in R and Python package mlsauce
):
The rationale is different from other histogram-based gradient boosting algorithms, as histograms are only used here for feature engineering of continuous features. So far, I don’t see huge differences with the original implementation of the GenericBooster
, but it’s still a work in progress. I envisage to try it out on a data set that contains a ‘higher’ mix of continuous and categorical features (as categorical features are not histogram-engineered).
Here are a few results that can give you an idea of the performance of the algorithm:
!pip install git+https://github.com/Techtonique/mlsauce.git --verbose --upgrade --no-cache-dir
import os
import mlsauce as ms
from sklearn.datasets import load_breast_cancer, load_iris, load_wine, load_digits
from sklearn.model_selection import train_test_split
from time import time
load_models = [load_breast_cancer, load_iris, load_wine, load_digits]
for model in load_models:
data = model()
X = data.data
y= data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .2, random_state = 13)
clf = ms.LazyBoostingClassifier(verbose=0, ignore_warnings=True, #n_jobs=2,
custom_metric=None, preprocess=False)
start = time()
models, predictioms = clf.fit(X_train, X_test, y_train, y_test, hist=True)
models2, predictioms = clf.fit(X_train, X_test, y_train, y_test, hist=False)
print(f"\nElapsed: {time() - start} seconds\n")
display(models)
display(models2)
2it [00:00, 2.27it/s]
100%|██████████| 38/38 [00:41<00:00, 1.09s/it]
2it [00:00, 5.14it/s]
100%|██████████| 38/38 [00:43<00:00, 1.14s/it]
Elapsed: 85.95083284378052 seconds
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken |
---|
Model | | | | | |
---|
GenericBooster(MultiTask(TweedieRegressor)) | 0.99 | 0.99 | 0.99 | 0.99 | 1.73 |
---|
GenericBooster(LinearRegression) | 0.99 | 0.99 | 0.99 | 0.99 | 0.37 |
---|
GenericBooster(TransformedTargetRegressor) | 0.99 | 0.99 | 0.99 | 0.99 | 0.40 |
---|
GenericBooster(RidgeCV) | 0.99 | 0.99 | 0.99 | 0.99 | 1.28 |
---|
GenericBooster(Ridge) | 0.99 | 0.99 | 0.99 | 0.99 | 0.27 |
---|
XGBClassifier | 0.96 | 0.96 | 0.96 | 0.96 | 0.50 |
---|
RandomForestClassifier | 0.96 | 0.96 | 0.96 | 0.96 | 0.37 |
---|
GenericBooster(ExtraTreeRegressor) | 0.94 | 0.94 | 0.94 | 0.94 | 0.40 |
---|
GenericBooster(MultiTask(BayesianRidge)) | 0.94 | 0.93 | 0.93 | 0.94 | 4.97 |
---|
GenericBooster(KNeighborsRegressor) | 0.87 | 0.89 | 0.89 | 0.87 | 0.70 |
---|
GenericBooster(DecisionTreeRegressor) | 0.87 | 0.88 | 0.88 | 0.87 | 2.24 |
---|
GenericBooster(MultiTaskElasticNet) | 0.87 | 0.79 | 0.79 | 0.86 | 0.11 |
---|
GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.86 | 0.79 | 0.79 | 0.85 | 1.28 |
---|
GenericBooster(MultiTaskLasso) | 0.85 | 0.76 | 0.76 | 0.84 | 0.06 |
---|
GenericBooster(ElasticNet) | 0.85 | 0.76 | 0.76 | 0.84 | 0.16 |
---|
GenericBooster(MultiTask(QuantileRegressor)) | 0.82 | 0.72 | 0.72 | 0.80 | 10.42 |
---|
GenericBooster(Lasso) | 0.82 | 0.71 | 0.71 | 0.79 | 0.09 |
---|
GenericBooster(LassoLars) | 0.82 | 0.71 | 0.71 | 0.79 | 0.08 |
---|
GenericBooster(MultiTask(LinearSVR)) | 0.81 | 0.69 | 0.69 | 0.78 | 14.75 |
---|
GenericBooster(DummyRegressor) | 0.68 | 0.50 | 0.50 | 0.56 | 0.01 |
---|
GenericBooster(MultiTask(SGDRegressor)) | 0.50 | 0.46 | 0.46 | 0.51 | 1.67 |
---|
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken |
---|
Model | | | | | |
---|
GenericBooster(MultiTask(TweedieRegressor)) | 0.99 | 0.99 | 0.99 | 0.99 | 1.67 |
---|
GenericBooster(LinearRegression) | 0.99 | 0.99 | 0.99 | 0.99 | 0.30 |
---|
GenericBooster(TransformedTargetRegressor) | 0.99 | 0.99 | 0.99 | 0.99 | 0.74 |
---|
GenericBooster(RidgeCV) | 0.99 | 0.99 | 0.99 | 0.99 | 2.77 |
---|
GenericBooster(Ridge) | 0.99 | 0.99 | 0.99 | 0.99 | 0.28 |
---|
XGBClassifier | 0.96 | 0.96 | 0.96 | 0.96 | 0.13 |
---|
GenericBooster(MultiTask(BayesianRidge)) | 0.94 | 0.93 | 0.93 | 0.94 | 7.81 |
---|
GenericBooster(ExtraTreeRegressor) | 0.94 | 0.94 | 0.94 | 0.94 | 0.23 |
---|
RandomForestClassifier | 0.92 | 0.93 | 0.93 | 0.92 | 0.25 |
---|
GenericBooster(KNeighborsRegressor) | 0.87 | 0.89 | 0.89 | 0.87 | 0.42 |
---|
GenericBooster(DecisionTreeRegressor) | 0.87 | 0.88 | 0.88 | 0.87 | 0.97 |
---|
GenericBooster(MultiTaskElasticNet) | 0.87 | 0.79 | 0.79 | 0.86 | 0.11 |
---|
GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.86 | 0.79 | 0.79 | 0.85 | 1.20 |
---|
GenericBooster(MultiTaskLasso) | 0.85 | 0.76 | 0.76 | 0.84 | 0.06 |
---|
GenericBooster(ElasticNet) | 0.85 | 0.76 | 0.76 | 0.84 | 0.09 |
---|
GenericBooster(MultiTask(QuantileRegressor)) | 0.82 | 0.72 | 0.72 | 0.80 | 10.57 |
---|
GenericBooster(LassoLars) | 0.82 | 0.71 | 0.71 | 0.79 | 0.09 |
---|
GenericBooster(Lasso) | 0.82 | 0.71 | 0.71 | 0.79 | 0.09 |
---|
GenericBooster(MultiTask(LinearSVR)) | 0.81 | 0.69 | 0.69 | 0.78 | 14.20 |
---|
GenericBooster(DummyRegressor) | 0.68 | 0.50 | 0.50 | 0.56 | 0.01 |
---|
GenericBooster(MultiTask(SGDRegressor)) | 0.50 | 0.46 | 0.46 | 0.51 | 1.33 |
---|
2it [00:00, 6.46it/s]
100%|██████████| 38/38 [00:12<00:00, 3.11it/s]
2it [00:00, 10.38it/s]
100%|██████████| 38/38 [00:11<00:00, 3.18it/s]
Elapsed: 24.71835470199585 seconds
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken |
---|
Model | | | | | |
---|
GenericBooster(RidgeCV) | 1.00 | 1.00 | None | 1.00 | 0.18 |
---|
GenericBooster(Ridge) | 1.00 | 1.00 | None | 1.00 | 0.14 |
---|
GenericBooster(LinearRegression) | 0.97 | 0.97 | None | 0.97 | 0.13 |
---|
GenericBooster(DecisionTreeRegressor) | 0.97 | 0.97 | None | 0.97 | 0.18 |
---|
GenericBooster(TransformedTargetRegressor) | 0.97 | 0.97 | None | 0.97 | 0.23 |
---|
GenericBooster(ExtraTreeRegressor) | 0.97 | 0.97 | None | 0.97 | 0.14 |
---|
XGBClassifier | 0.97 | 0.97 | None | 0.97 | 0.05 |
---|
RandomForestClassifier | 0.93 | 0.95 | None | 0.93 | 0.26 |
---|
GenericBooster(KNeighborsRegressor) | 0.93 | 0.95 | None | 0.93 | 0.27 |
---|
GenericBooster(MultiTask(SGDRegressor)) | 0.90 | 0.92 | None | 0.90 | 0.75 |
---|
GenericBooster(MultiTask(TweedieRegressor)) | 0.90 | 0.92 | None | 0.90 | 1.61 |
---|
GenericBooster(MultiTask(LinearSVR)) | 0.80 | 0.85 | None | 0.80 | 2.15 |
---|
GenericBooster(MultiTaskElasticNet) | 0.80 | 0.85 | None | 0.80 | 0.07 |
---|
GenericBooster(MultiTask(BayesianRidge)) | 0.63 | 0.72 | None | 0.57 | 2.42 |
---|
GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.57 | 0.67 | None | 0.45 | 1.05 |
---|
GenericBooster(Lars) | 0.50 | 0.46 | None | 0.48 | 0.59 |
---|
GenericBooster(MultiTask(QuantileRegressor)) | 0.43 | 0.33 | None | 0.26 | 2.19 |
---|
GenericBooster(LassoLars) | 0.27 | 0.33 | None | 0.11 | 0.01 |
---|
GenericBooster(MultiTaskLasso) | 0.27 | 0.33 | None | 0.11 | 0.01 |
---|
GenericBooster(Lasso) | 0.27 | 0.33 | None | 0.11 | 0.01 |
---|
GenericBooster(ElasticNet) | 0.27 | 0.33 | None | 0.11 | 0.01 |
---|
GenericBooster(DummyRegressor) | 0.27 | 0.33 | None | 0.11 | 0.01 |
---|
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken |
---|
Model | | | | | |
---|
GenericBooster(RidgeCV) | 1.00 | 1.00 | None | 1.00 | 0.16 |
---|
GenericBooster(Ridge) | 1.00 | 1.00 | None | 1.00 | 0.16 |
---|
RandomForestClassifier | 0.97 | 0.97 | None | 0.97 | 0.15 |
---|
GenericBooster(LinearRegression) | 0.97 | 0.97 | None | 0.97 | 0.13 |
---|
GenericBooster(DecisionTreeRegressor) | 0.97 | 0.97 | None | 0.97 | 0.16 |
---|
GenericBooster(TransformedTargetRegressor) | 0.97 | 0.97 | None | 0.97 | 0.24 |
---|
GenericBooster(ExtraTreeRegressor) | 0.97 | 0.97 | None | 0.97 | 0.14 |
---|
XGBClassifier | 0.97 | 0.97 | None | 0.97 | 0.04 |
---|
GenericBooster(KNeighborsRegressor) | 0.93 | 0.95 | None | 0.93 | 0.28 |
---|
GenericBooster(MultiTask(SGDRegressor)) | 0.90 | 0.92 | None | 0.90 | 0.78 |
---|
GenericBooster(MultiTask(TweedieRegressor)) | 0.90 | 0.92 | None | 0.90 | 1.35 |
---|
GenericBooster(MultiTask(LinearSVR)) | 0.80 | 0.85 | None | 0.80 | 2.15 |
---|
GenericBooster(MultiTaskElasticNet) | 0.80 | 0.85 | None | 0.80 | 0.07 |
---|
GenericBooster(MultiTask(BayesianRidge)) | 0.63 | 0.72 | None | 0.57 | 1.81 |
---|
GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.57 | 0.67 | None | 0.45 | 1.21 |
---|
GenericBooster(Lars) | 0.50 | 0.46 | None | 0.48 | 0.58 |
---|
GenericBooster(MultiTask(QuantileRegressor)) | 0.43 | 0.33 | None | 0.26 | 2.63 |
---|
GenericBooster(LassoLars) | 0.27 | 0.33 | None | 0.11 | 0.01 |
---|
GenericBooster(MultiTaskLasso) | 0.27 | 0.33 | None | 0.11 | 0.01 |
---|
GenericBooster(Lasso) | 0.27 | 0.33 | None | 0.11 | 0.01 |
---|
GenericBooster(ElasticNet) | 0.27 | 0.33 | None | 0.11 | 0.02 |
---|
GenericBooster(DummyRegressor) | 0.27 | 0.33 | None | 0.11 | 0.01 |
---|
2it [00:00, 5.45it/s]
100%|██████████| 38/38 [00:14<00:00, 2.63it/s]
2it [00:00, 9.26it/s]
100%|██████████| 38/38 [00:14<00:00, 2.58it/s]
Elapsed: 29.76035761833191 seconds
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken |
---|
Model | | | | | |
---|
RandomForestClassifier | 1.00 | 1.00 | None | 1.00 | 0.30 |
---|
GenericBooster(ExtraTreeRegressor) | 1.00 | 1.00 | None | 1.00 | 0.17 |
---|
GenericBooster(TransformedTargetRegressor) | 1.00 | 1.00 | None | 1.00 | 0.26 |
---|
GenericBooster(RidgeCV) | 1.00 | 1.00 | None | 1.00 | 0.23 |
---|
GenericBooster(Ridge) | 1.00 | 1.00 | None | 1.00 | 0.15 |
---|
GenericBooster(LinearRegression) | 1.00 | 1.00 | None | 1.00 | 0.15 |
---|
XGBClassifier | 0.97 | 0.96 | None | 0.97 | 0.06 |
---|
GenericBooster(MultiTask(SGDRegressor)) | 0.97 | 0.98 | None | 0.97 | 1.10 |
---|
GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.97 | 0.98 | None | 0.97 | 1.18 |
---|
GenericBooster(MultiTask(LinearSVR)) | 0.97 | 0.98 | None | 0.97 | 3.71 |
---|
GenericBooster(MultiTask(BayesianRidge)) | 0.97 | 0.98 | None | 0.97 | 1.86 |
---|
GenericBooster(MultiTask(TweedieRegressor)) | 0.97 | 0.98 | None | 0.97 | 1.39 |
---|
GenericBooster(Lars) | 0.94 | 0.94 | None | 0.95 | 0.93 |
---|
GenericBooster(KNeighborsRegressor) | 0.92 | 0.93 | None | 0.92 | 0.19 |
---|
GenericBooster(DecisionTreeRegressor) | 0.92 | 0.92 | None | 0.92 | 0.22 |
---|
GenericBooster(MultiTaskElasticNet) | 0.69 | 0.61 | None | 0.61 | 0.03 |
---|
GenericBooster(ElasticNet) | 0.61 | 0.53 | None | 0.53 | 0.05 |
---|
GenericBooster(MultiTaskLasso) | 0.42 | 0.33 | None | 0.25 | 0.01 |
---|
GenericBooster(LassoLars) | 0.42 | 0.33 | None | 0.25 | 0.01 |
---|
GenericBooster(Lasso) | 0.42 | 0.33 | None | 0.25 | 0.01 |
---|
GenericBooster(DummyRegressor) | 0.42 | 0.33 | None | 0.25 | 0.01 |
---|
GenericBooster(MultiTask(QuantileRegressor)) | 0.25 | 0.33 | None | 0.10 | 2.73 |
---|
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken |
---|
Model | | | | | |
---|
RandomForestClassifier | 1.00 | 1.00 | None | 1.00 | 0.15 |
---|
GenericBooster(ExtraTreeRegressor) | 1.00 | 1.00 | None | 1.00 | 0.16 |
---|
GenericBooster(TransformedTargetRegressor) | 1.00 | 1.00 | None | 1.00 | 0.24 |
---|
GenericBooster(RidgeCV) | 1.00 | 1.00 | None | 1.00 | 0.22 |
---|
GenericBooster(Ridge) | 1.00 | 1.00 | None | 1.00 | 0.16 |
---|
GenericBooster(LinearRegression) | 1.00 | 1.00 | None | 1.00 | 0.15 |
---|
XGBClassifier | 0.97 | 0.96 | None | 0.97 | 0.06 |
---|
GenericBooster(MultiTask(SGDRegressor)) | 0.97 | 0.98 | None | 0.97 | 0.84 |
---|
GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.97 | 0.98 | None | 0.97 | 1.18 |
---|
GenericBooster(MultiTask(LinearSVR)) | 0.97 | 0.98 | None | 0.97 | 3.41 |
---|
GenericBooster(MultiTask(BayesianRidge)) | 0.97 | 0.98 | None | 0.97 | 2.15 |
---|
GenericBooster(MultiTask(TweedieRegressor)) | 0.97 | 0.98 | None | 0.97 | 1.91 |
---|
GenericBooster(Lars) | 0.94 | 0.94 | None | 0.95 | 0.93 |
---|
GenericBooster(KNeighborsRegressor) | 0.92 | 0.93 | None | 0.92 | 0.20 |
---|
GenericBooster(DecisionTreeRegressor) | 0.92 | 0.92 | None | 0.92 | 0.23 |
---|
GenericBooster(MultiTaskElasticNet) | 0.69 | 0.61 | None | 0.61 | 0.03 |
---|
GenericBooster(ElasticNet) | 0.61 | 0.53 | None | 0.53 | 0.04 |
---|
GenericBooster(MultiTaskLasso) | 0.42 | 0.33 | None | 0.25 | 0.01 |
---|
GenericBooster(LassoLars) | 0.42 | 0.33 | None | 0.25 | 0.01 |
---|
GenericBooster(Lasso) | 0.42 | 0.33 | None | 0.25 | 0.01 |
---|
GenericBooster(DummyRegressor) | 0.42 | 0.33 | None | 0.25 | 0.01 |
---|
GenericBooster(MultiTask(QuantileRegressor)) | 0.25 | 0.33 | None | 0.10 | 2.78 |
---|
2it [00:01, 1.90it/s]
100%|██████████| 38/38 [09:30<00:00, 15.02s/it]
2it [00:01, 1.03it/s]
100%|██████████| 38/38 [09:27<00:00, 14.94s/it]
Elapsed: 1141.7054164409637 seconds
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken |
---|
Model | | | | | |
---|
RandomForestClassifier | 0.97 | 0.97 | None | 0.97 | 0.56 |
---|
XGBClassifier | 0.97 | 0.97 | None | 0.97 | 0.50 |
---|
GenericBooster(ExtraTreeRegressor) | 0.96 | 0.96 | None | 0.96 | 1.75 |
---|
GenericBooster(KNeighborsRegressor) | 0.95 | 0.95 | None | 0.95 | 4.34 |
---|
GenericBooster(LinearRegression) | 0.94 | 0.94 | None | 0.94 | 4.47 |
---|
GenericBooster(MultiTask(BayesianRidge)) | 0.94 | 0.94 | None | 0.94 | 51.97 |
---|
GenericBooster(TransformedTargetRegressor) | 0.94 | 0.94 | None | 0.94 | 2.54 |
---|
GenericBooster(RidgeCV) | 0.94 | 0.94 | None | 0.94 | 4.55 |
---|
GenericBooster(Ridge) | 0.94 | 0.94 | None | 0.94 | 0.63 |
---|
GenericBooster(MultiTask(TweedieRegressor)) | 0.93 | 0.93 | None | 0.93 | 13.86 |
---|
GenericBooster(DecisionTreeRegressor) | 0.88 | 0.88 | None | 0.88 | 6.14 |
---|
GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.79 | 0.79 | None | 0.80 | 13.46 |
---|
GenericBooster(MultiTask(LinearSVR)) | 0.37 | 0.39 | None | 0.26 | 297.07 |
---|
GenericBooster(Lars) | 0.20 | 0.20 | None | 0.21 | 19.23 |
---|
GenericBooster(MultiTask(QuantileRegressor)) | 0.12 | 0.10 | None | 0.03 | 140.91 |
---|
GenericBooster(MultiTask(SGDRegressor)) | 0.10 | 0.10 | None | 0.06 | 9.46 |
---|
GenericBooster(LassoLars) | 0.07 | 0.10 | None | 0.01 | 0.05 |
---|
GenericBooster(Lasso) | 0.07 | 0.10 | None | 0.01 | 0.07 |
---|
GenericBooster(MultiTaskLasso) | 0.07 | 0.10 | None | 0.01 | 0.04 |
---|
GenericBooster(ElasticNet) | 0.07 | 0.10 | None | 0.01 | 0.03 |
---|
GenericBooster(DummyRegressor) | 0.07 | 0.10 | None | 0.01 | 0.02 |
---|
GenericBooster(MultiTaskElasticNet) | 0.07 | 0.10 | None | 0.01 | 0.05 |
---|
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken |
---|
Model | | | | | |
---|
RandomForestClassifier | 0.97 | 0.97 | None | 0.97 | 0.67 |
---|
XGBClassifier | 0.97 | 0.97 | None | 0.97 | 1.27 |
---|
GenericBooster(ExtraTreeRegressor) | 0.96 | 0.96 | None | 0.96 | 1.69 |
---|
GenericBooster(KNeighborsRegressor) | 0.95 | 0.95 | None | 0.95 | 4.76 |
---|
GenericBooster(LinearRegression) | 0.94 | 0.94 | None | 0.94 | 2.01 |
---|
GenericBooster(MultiTask(BayesianRidge)) | 0.94 | 0.94 | None | 0.94 | 46.87 |
---|
GenericBooster(TransformedTargetRegressor) | 0.94 | 0.94 | None | 0.94 | 5.40 |
---|
GenericBooster(RidgeCV) | 0.94 | 0.94 | None | 0.94 | 3.93 |
---|
GenericBooster(Ridge) | 0.94 | 0.94 | None | 0.94 | 0.60 |
---|
GenericBooster(MultiTask(TweedieRegressor)) | 0.93 | 0.93 | None | 0.93 | 14.96 |
---|
GenericBooster(DecisionTreeRegressor) | 0.88 | 0.88 | None | 0.88 | 4.12 |
---|
GenericBooster(MultiTask(PassiveAggressiveRegressor)) | 0.79 | 0.79 | None | 0.80 | 12.68 |
---|
GenericBooster(MultiTask(LinearSVR)) | 0.37 | 0.39 | None | 0.26 | 294.88 |
---|
GenericBooster(Lars) | 0.20 | 0.20 | None | 0.21 | 19.40 |
---|
GenericBooster(MultiTask(QuantileRegressor)) | 0.12 | 0.10 | None | 0.03 | 145.91 |
---|
GenericBooster(MultiTask(SGDRegressor)) | 0.10 | 0.10 | None | 0.06 | 10.30 |
---|
GenericBooster(LassoLars) | 0.07 | 0.10 | None | 0.01 | 0.02 |
---|
GenericBooster(Lasso) | 0.07 | 0.10 | None | 0.01 | 0.03 |
---|
GenericBooster(MultiTaskLasso) | 0.07 | 0.10 | None | 0.01 | 0.03 |
---|
GenericBooster(ElasticNet) | 0.07 | 0.10 | None | 0.01 | 0.03 |
---|
GenericBooster(DummyRegressor) | 0.07 | 0.10 | None | 0.01 | 0.02 |
---|
GenericBooster(MultiTaskElasticNet) | 0.07 | 0.10 | None | 0.01 | 0.03 |
---|
Related
Want to share your content on python-bloggers?
click here.