Python-bloggers

AutoML in nnetsauce (randomized and quasi-randomized nnetworks)

This article was first published on T. Moudiki's Webpage - Python , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Content:

  1. Installing nnetsauce for Python
  2. Classification
  3. Regression

Disclaimer: I have no affiliation with the lazypredict project.

A few days ago, I stumbled accross a cool Python package called lazypredict. Pretty well-designed, working, and relying on scikit-learn’s design.

With lazypredict, you can rapidly have an idea of which scikit-learn model (can also work with xgboost’s and lightgbm’s scikit-learn-like interfaces) performs best on a given data set, with a little preprocessing, and without hyperparameters’ tuning (this is important to note).

I thought something similar could be beneficial to nnetsauce’s classes CustomClassifier, CustomRegressor (see detailed examples below, and interact with the graphs) and MTS. For now.

So far, in nnetsauce (Python version), I adapted the lazy prediction feature to regression (CustomRegressor) and classification (CustomClassifier). Not for univariate and multivariate time series forecasting (MTS) yet. You can try it from a GitHub branch.

2 – Installation

!pip install git+https://github.com/Techtonique/nnetsauce.git@lazy-predict

2 – Classification

2 – 1 Loading the Dataset

import nnetsauce as ns
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
X = data.data
y= data.target

2 – 2 Building the classification model using LazyPredict

from sklearn.model_selection import train_test_split

# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    random_state=123)

# build the lazyclassifier
clf = ns.LazyClassifier(verbose=0, ignore_warnings=True,
                        custom_metric=None,
                        n_hidden_features=10,
                        col_sample=0.9)

# fit it
models, predictions = clf.fit(X_train, X_test, y_train, y_test)
100%|██████████| 27/27 [00:09<00:00,  2.71it/s]
# print the best models
display(models)
Accuracy Balanced Accuracy ROC AUC F1 Score Time Taken
Model
LogisticRegression 0.99 0.99 0.99 0.99 0.69
LinearSVC 0.98 0.98 0.98 0.98 0.33
SGDClassifier 0.98 0.98 0.98 0.98 0.19
Perceptron 0.98 0.98 0.98 0.98 0.15
LabelPropagation 0.98 0.98 0.98 0.98 0.33
LabelSpreading 0.98 0.98 0.98 0.98 0.43
SVC 0.98 0.98 0.98 0.98 0.16
RandomForestClassifier 0.98 0.98 0.98 0.98 0.66
ExtraTreesClassifier 0.98 0.98 0.98 0.98 0.40
KNeighborsClassifier 0.98 0.98 0.98 0.98 0.34
DecisionTreeClassifier 0.97 0.97 0.97 0.97 0.53
PassiveAggressiveClassifier 0.97 0.97 0.97 0.97 0.21
LinearDiscriminantAnalysis 0.97 0.96 0.96 0.97 0.19
CalibratedClassifierCV 0.97 0.96 0.96 0.97 0.24
AdaBoostClassifier 0.96 0.96 0.96 0.96 1.31
BaggingClassifier 0.95 0.95 0.95 0.95 0.63
RidgeClassifier 0.96 0.94 0.94 0.96 0.27
RidgeClassifierCV 0.96 0.94 0.94 0.96 0.18
QuadraticDiscriminantAnalysis 0.95 0.94 0.94 0.95 0.81
ExtraTreeClassifier 0.94 0.93 0.93 0.94 0.12
NuSVC 0.94 0.91 0.91 0.94 0.29
GaussianNB 0.93 0.91 0.91 0.93 0.17
BernoulliNB 0.92 0.90 0.90 0.92 0.31
NearestCentroid 0.92 0.89 0.89 0.92 0.24
DummyClassifier 0.64 0.50 0.50 0.50 0.27

model_dictionary = clf.provide_models(X_train, X_test, y_train, y_test)
model_dictionary['LogisticRegression']
Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('numeric',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer()),
                                                                  ('scaler',
                                                                   StandardScaler())]),
                                                  Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
           dtype='int64')),
                                                 ('categorical_low',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='c...
                                                                   OneHotEncoder(handle_unknown='ignore',
                                                                                 sparse=False))]),
                                                  Int64Index([], dtype='int64')),
                                                 ('categorical_high',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='constant')),
                                                                  ('encoding',
                                                                   OrdinalEncoder())]),
                                                  Int64Index([], dtype='int64'))])),
                ('classifier',
                 CustomClassifier(col_sample=0.9, n_hidden_features=10,
                                  obj=LogisticRegression(random_state=42)))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.