mlsauce (home to a model-agnostic gradient boosting algorithm) can now be installed from PyPI.
Want to share your content on python-bloggers? click here.
Python package mlsauce (docs are here), which is home to a model-agnostic gradient boosting algorithm (described in https://www.researchgate.net/publication/386212136_Scalable_Gradient_Boosting_using_Randomized_Neural_Networks and https://www.researchgate.net/publication/346059361_LSBoost_gradient_boosted_penalized_nonlinear_least_squares), can now be installed from PyPI. This makes it easier for users to install and use the package without needing to clone the repository or manage dependencies manually.
The algorithm is designed to be model-agnostic, meaning it can work with various base learners, including decision trees, linear models, and neural networks. This flexibility allows users to choose the best base learner for their specific problem while still benefiting from the gradient boosting framework.
To install mlsauce, you can use the following command:
!pip install mlsauce --verbose
This will download and install the latest version of mlsauce along with its dependencies. Once installed, you can use it in your Python projects to implement model-agnostic gradient boosting with various base learners. Here is an example of how to use mlsauce with different base learners:
import mlsauce as ms from sklearn.utils import all_estimators from tqdm import tqdm from sklearn.metrics import balanced_accuracy_score, make_scorer from sklearn.model_selection import train_test_split estimators = all_estimators(type_filter='regressor') from sklearn.datasets import load_digits, load_breast_cancer, load_wine, load_iris from sklearn.tree import ExtraTreeRegressor from sklearn.linear_model import Ridge from time import time results = [] datasets = [load_digits(return_X_y=True), load_breast_cancer(return_X_y=True), load_wine(return_X_y=True), load_iris(return_X_y=True)] names = ["digits", "breast_cancer", "wine", "iris"] balanced_accuracy_scorer = make_scorer(balanced_accuracy_score) for i, dataset in tqdm(enumerate(datasets)): X, y = dataset X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42, stratify=y) dataset_name = names[i] print(f"\n Dataset: {dataset_name} --------------------") for name, estimator in tqdm(estimators): try: # Split dataset into training and testing sets model = ms.GenericBoostingClassifier(estimator(), n_estimators=10, learning_rate=0.1, verbose=0) start = time() model.fit(X_train, y_train) y_pred_train = model.predict(X_train) y_pred = model.predict(X_test) elapsed = time() - start # Calculate balanced accuracy balanced_acc_train = balanced_accuracy_score(y_train, y_pred_train) balanced_acc_test = balanced_accuracy_score(y_test, y_pred) res = [name, dataset_name, balanced_acc_train, balanced_acc_test, elapsed] print(f"\n Result for {name} base learner: {res}\n") results.append(res) except Exception as e: continue import numpy as np import pandas as pd results_df = pd.DataFrame(results, columns=["Base Learner", "Dataset", "Train Acc", "Test Acc", "Time"]) results_df['Pct Diff'] = np.log(1 + np.abs((results_df['Train Acc'] - results_df['Test Acc']) / results_df['Test Acc'])) results_df.sort_values(by=['Pct Diff'], ascending=True) import matplotlib.pyplot as plt # Order by increasing average testing accuracy average_accuracies_ordered = average_accuracies.sort_values(by='Test Acc') # Create the scatter plot for average training and testing accuracy, ordered by Test Acc plt.figure(figsize=(12, 7)) # Plot average training accuracy plt.scatter(average_accuracies_ordered['Base Learner'], average_accuracies_ordered['Train Acc'], label='Average Train Acc', marker='o', s=50) # Plot average testing accuracy plt.scatter(average_accuracies_ordered['Base Learner'], average_accuracies_ordered['Test Acc'], label='Average Test Acc', marker='o', s=50) plt.title('Average Training and Testing Accuracy for Each Model (Ordered by Test Accuracy)') plt.xlabel('Model (Ordered by Increasing Average Test Accuracy)') plt.ylabel('Average Accuracy') plt.xticks(rotation=90) # Rotate x-axis labels for better readability plt.legend() plt.grid(True) plt.tight_layout() plt.show()
Comparison of average training and testing accuracy for each model can be visualized using a scatter plot. The plot will show the average training and testing accuracy for each model, ordered by increasing average testing accuracy. This allows for easy comparison of how well each model performs on both the training and testing datasets, and spots for potential overfitting.
Want to share your content on python-bloggers? click here.