AutoML in nnetsauce (randomized and quasi-randomized nnetworks)

T. Moudiki

2 years ago

This article was first published on T. Moudiki's Webpage - Python , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Content:

Installing nnetsauce for Python
Classification
Regression

Disclaimer: I have no affiliation with the lazypredict project.

A few days ago, I stumbled accross a cool Python package called lazypredict. Pretty well-designed, working, and relying on scikit-learn’s design.

With lazypredict, you can rapidly have an idea of which scikit-learn model (can also work with xgboost’s and lightgbm’s scikit-learn-like interfaces) performs best on a given data set, with a little preprocessing, and without hyperparameters’ tuning (this is important to note).

I thought something similar could be beneficial to nnetsauce’s classes CustomClassifier, CustomRegressor (see detailed examples below, and interact with the graphs) and MTS. For now.

So far, in nnetsauce (Python version), I adapted the lazy prediction feature to regression (CustomRegressor) and classification (CustomClassifier). Not for univariate and multivariate time series forecasting (MTS) yet. You can try it from a GitHub branch.

2 – Installation

!pip install git+https://github.com/Techtonique/nnetsauce.git@lazy-predict

2 – Classification

2 – 1 Loading the Dataset

import nnetsauce as ns
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
X = data.data
y= data.target

2 – 2 Building the classification model using LazyPredict

from sklearn.model_selection import train_test_split

# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    random_state=123)

# build the lazyclassifier
clf = ns.LazyClassifier(verbose=0, ignore_warnings=True,
                        custom_metric=None,
                        n_hidden_features=10,
                        col_sample=0.9)

# fit it
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

100%|██████████| 27/27 [00:09<00:00,  2.71it/s]

# print the best models
display(models)

	Accuracy	Balanced Accuracy	ROC AUC	F1 Score	Time Taken
Model
LogisticRegression	0.99	0.99	0.99	0.99	0.69
LinearSVC	0.98	0.98	0.98	0.98	0.33
SGDClassifier	0.98	0.98	0.98	0.98	0.19
Perceptron	0.98	0.98	0.98	0.98	0.15
LabelPropagation	0.98	0.98	0.98	0.98	0.33
LabelSpreading	0.98	0.98	0.98	0.98	0.43
SVC	0.98	0.98	0.98	0.98	0.16
RandomForestClassifier	0.98	0.98	0.98	0.98	0.66
ExtraTreesClassifier	0.98	0.98	0.98	0.98	0.40
KNeighborsClassifier	0.98	0.98	0.98	0.98	0.34
DecisionTreeClassifier	0.97	0.97	0.97	0.97	0.53
PassiveAggressiveClassifier	0.97	0.97	0.97	0.97	0.21
LinearDiscriminantAnalysis	0.97	0.96	0.96	0.97	0.19
CalibratedClassifierCV	0.97	0.96	0.96	0.97	0.24
AdaBoostClassifier	0.96	0.96	0.96	0.96	1.31
BaggingClassifier	0.95	0.95	0.95	0.95	0.63
RidgeClassifier	0.96	0.94	0.94	0.96	0.27
RidgeClassifierCV	0.96	0.94	0.94	0.96	0.18
QuadraticDiscriminantAnalysis	0.95	0.94	0.94	0.95	0.81
ExtraTreeClassifier	0.94	0.93	0.93	0.94	0.12
NuSVC	0.94	0.91	0.91	0.94	0.29
GaussianNB	0.93	0.91	0.91	0.93	0.17
BernoulliNB	0.92	0.90	0.90	0.92	0.31
NearestCentroid	0.92	0.89	0.89	0.92	0.24
DummyClassifier	0.64	0.50	0.50	0.50	0.27

model_dictionary = clf.provide_models(X_train, X_test, y_train, y_test)

model_dictionary['LogisticRegression']

Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('numeric',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer()),
                                                                  ('scaler',
                                                                   StandardScaler())]),
                                                  Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
           dtype='int64')),
                                                 ('categorical_low',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='c...
                                                                   OneHotEncoder(handle_unknown='ignore',
                                                                                 sparse=False))]),
                                                  Int64Index([], dtype='int64')),
                                                 ('categorical_high',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='constant')),
                                                                  ('encoding',
                                                                   OrdinalEncoder())]),
                                                  Int64Index([], dtype='int64'))])),
                ('classifier',
                 CustomClassifier(col_sample=0.9, n_hidden_features=10,
                                  obj=LogisticRegression(random_state=42)))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Pipeline

Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('numeric',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer()),
                                                                  ('scaler',
                                                                   StandardScaler())]),
                                                  Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
           dtype='int64')),
                                                 ('categorical_low',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='c...
                                                                   OneHotEncoder(handle_unknown='ignore',
                                                                                 sparse=False))]),
                                                  Int64Index([], dtype='int64')),
                                                 ('categorical_high',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='constant')),
                                                                  ('encoding',
                                                                   OrdinalEncoder())]),
                                                  Int64Index([], dtype='int64'))])),
                ('classifier',
                 CustomClassifier(col_sample=0.9, n_hidden_features=10,
                                  obj=LogisticRegression(random_state=42)))])

preprocessor: ColumnTransformer

ColumnTransformer(transformers=[('numeric',
                                 Pipeline(steps=[('imputer', SimpleImputer()),
                                                 ('scaler', StandardScaler())]),
                                 Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
           dtype='int64')),
                                ('categorical_low',
                                 Pipeline(steps=[('imputer',
                                                  SimpleImputer(fill_value='missing',
                                                                strategy='constant')),
                                                 ('encoding',
                                                  OneHotEncoder(handle_unknown='ignore',
                                                                sparse=False))]),
                                 Int64Index([], dtype='int64')),
                                ('categorical_high',
                                 Pipeline(steps=[('imputer',
                                                  SimpleImputer(fill_value='missing',
                                                                strategy='constant')),
                                                 ('encoding',
                                                  OrdinalEncoder())]),
                                 Int64Index([], dtype='int64'))])

numeric

Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
           dtype='int64')

SimpleImputer

SimpleImputer()

StandardScaler

StandardScaler()

categorical_low

Int64Index([], dtype='int64')

SimpleImputer

SimpleImputer(fill_value='missing', strategy='constant')

OneHotEncoder

OneHotEncoder(handle_unknown='ignore', sparse=False)

categorical_high

Int64Index([], dtype='int64')

SimpleImputer

SimpleImputer(fill_value='missing', strategy='constant')

OrdinalEncoder

OrdinalEncoder()

classifier: CustomClassifier

CustomClassifier(col_sample=0.9, n_hidden_features=10,
                 obj=LogisticRegression(random_state=42))

obj: LogisticRegression

LogisticRegression(random_state=42)

LogisticRegression

LogisticRegression(random_state=42)

model_dictionary['LogisticRegression'].get_params()

{'memory': None,
 'steps': [('preprocessor',
   ColumnTransformer(transformers=[('numeric',
                                    Pipeline(steps=[('imputer', SimpleImputer()),
                                                    ('scaler', StandardScaler())]),
                                    Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
               17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
              dtype='int64')),
                                   ('categorical_low',
                                    Pipeline(steps=[('imputer',
                                                     SimpleImputer(fill_value='missing',
                                                                   strategy='constant')),
                                                    ('encoding',
                                                     OneHotEncoder(handle_unknown='ignore',
                                                                   sparse=False))]),
                                    Int64Index([], dtype='int64')),
                                   ('categorical_high',
                                    Pipeline(steps=[('imputer',
                                                     SimpleImputer(fill_value='missing',
                                                                   strategy='constant')),
                                                    ('encoding',
                                                     OrdinalEncoder())]),
                                    Int64Index([], dtype='int64'))])),
  ('classifier',
   CustomClassifier(col_sample=0.9, n_hidden_features=10,
                    obj=LogisticRegression(random_state=42)))],
 'verbose': False,
 'preprocessor': ColumnTransformer(transformers=[('numeric',
                                  Pipeline(steps=[('imputer', SimpleImputer()),
                                                  ('scaler', StandardScaler())]),
                                  Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
             17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
            dtype='int64')),
                                 ('categorical_low',
                                  Pipeline(steps=[('imputer',
                                                   SimpleImputer(fill_value='missing',
                                                                 strategy='constant')),
                                                  ('encoding',
                                                   OneHotEncoder(handle_unknown='ignore',
                                                                 sparse=False))]),
                                  Int64Index([], dtype='int64')),
                                 ('categorical_high',
                                  Pipeline(steps=[('imputer',
                                                   SimpleImputer(fill_value='missing',
                                                                 strategy='constant')),
                                                  ('encoding',
                                                   OrdinalEncoder())]),
                                  Int64Index([], dtype='int64'))]),
 'classifier': CustomClassifier(col_sample=0.9, n_hidden_features=10,
                  obj=LogisticRegression(random_state=42)),
 'preprocessor__n_jobs': None,
 'preprocessor__remainder': 'drop',
 'preprocessor__sparse_threshold': 0.3,
 'preprocessor__transformer_weights': None,
 'preprocessor__transformers': [('numeric',
   Pipeline(steps=[('imputer', SimpleImputer()), ('scaler', StandardScaler())]),
   Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
               17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
              dtype='int64')),
  ('categorical_low',
   Pipeline(steps=[('imputer',
                    SimpleImputer(fill_value='missing', strategy='constant')),
                   ('encoding',
                    OneHotEncoder(handle_unknown='ignore', sparse=False))]),
   Int64Index([], dtype='int64')),
  ('categorical_high',
   Pipeline(steps=[('imputer',
                    SimpleImputer(fill_value='missing', strategy='constant')),
                   ('encoding', OrdinalEncoder())]),
   Int64Index([], dtype='int64'))],
 'preprocessor__verbose': False,
 'preprocessor__verbose_feature_names_out': True,
 'preprocessor__numeric': Pipeline(steps=[('imputer', SimpleImputer()), ('scaler', StandardScaler())]),
 'preprocessor__categorical_low': Pipeline(steps=[('imputer',
                  SimpleImputer(fill_value='missing', strategy='constant')),
                 ('encoding',
                  OneHotEncoder(handle_unknown='ignore', sparse=False))]),
 'preprocessor__categorical_high': Pipeline(steps=[('imputer',
                  SimpleImputer(fill_value='missing', strategy='constant')),
                 ('encoding', OrdinalEncoder())]),
 'preprocessor__numeric__memory': None,
 'preprocessor__numeric__steps': [('imputer', SimpleImputer()),
  ('scaler', StandardScaler())],
 'preprocessor__numeric__verbose': False,
 'preprocessor__numeric__imputer': SimpleImputer(),
 'preprocessor__numeric__scaler': StandardScaler(),
 'preprocessor__numeric__imputer__add_indicator': False,
 'preprocessor__numeric__imputer__copy': True,
 'preprocessor__numeric__imputer__fill_value': None,
 'preprocessor__numeric__imputer__keep_empty_features': False,
 'preprocessor__numeric__imputer__missing_values': nan,
 'preprocessor__numeric__imputer__strategy': 'mean',
 'preprocessor__numeric__imputer__verbose': 'deprecated',
 'preprocessor__numeric__scaler__copy': True,
 'preprocessor__numeric__scaler__with_mean': True,
 'preprocessor__numeric__scaler__with_std': True,
 'preprocessor__categorical_low__memory': None,
 'preprocessor__categorical_low__steps': [('imputer',
   SimpleImputer(fill_value='missing', strategy='constant')),
  ('encoding', OneHotEncoder(handle_unknown='ignore', sparse=False))],
 'preprocessor__categorical_low__verbose': False,
 'preprocessor__categorical_low__imputer': SimpleImputer(fill_value='missing', strategy='constant'),
 'preprocessor__categorical_low__encoding': OneHotEncoder(handle_unknown='ignore', sparse=False),
 'preprocessor__categorical_low__imputer__add_indicator': False,
 'preprocessor__categorical_low__imputer__copy': True,
 'preprocessor__categorical_low__imputer__fill_value': 'missing',
 'preprocessor__categorical_low__imputer__keep_empty_features': False,
 'preprocessor__categorical_low__imputer__missing_values': nan,
 'preprocessor__categorical_low__imputer__strategy': 'constant',
 'preprocessor__categorical_low__imputer__verbose': 'deprecated',
 'preprocessor__categorical_low__encoding__categories': 'auto',
 'preprocessor__categorical_low__encoding__drop': None,
 'preprocessor__categorical_low__encoding__dtype': numpy.float64,
 'preprocessor__categorical_low__encoding__handle_unknown': 'ignore',
 'preprocessor__categorical_low__encoding__max_categories': None,
 'preprocessor__categorical_low__encoding__min_frequency': None,
 'preprocessor__categorical_low__encoding__sparse': False,
 'preprocessor__categorical_low__encoding__sparse_output': True,
 'preprocessor__categorical_high__memory': None,
 'preprocessor__categorical_high__steps': [('imputer',
   SimpleImputer(fill_value='missing', strategy='constant')),
  ('encoding', OrdinalEncoder())],
 'preprocessor__categorical_high__verbose': False,
 'preprocessor__categorical_high__imputer': SimpleImputer(fill_value='missing', strategy='constant'),
 'preprocessor__categorical_high__encoding': OrdinalEncoder(),
 'preprocessor__categorical_high__imputer__add_indicator': False,
 'preprocessor__categorical_high__imputer__copy': True,
 'preprocessor__categorical_high__imputer__fill_value': 'missing',
 'preprocessor__categorical_high__imputer__keep_empty_features': False,
 'preprocessor__categorical_high__imputer__missing_values': nan,
 'preprocessor__categorical_high__imputer__strategy': 'constant',
 'preprocessor__categorical_high__imputer__verbose': 'deprecated',
 'preprocessor__categorical_high__encoding__categories': 'auto',
 'preprocessor__categorical_high__encoding__dtype': numpy.float64,
 'preprocessor__categorical_high__encoding__encoded_missing_value': nan,
 'preprocessor__categorical_high__encoding__handle_unknown': 'error',
 'preprocessor__categorical_high__encoding__unknown_value': None,
 'classifier__a': 0.01,
 'classifier__activation_name': 'relu',
 'classifier__backend': 'cpu',
 'classifier__bias': True,
 'classifier__cluster_encode': True,
 'classifier__col_sample': 0.9,
 'classifier__direct_link': True,
 'classifier__dropout': 0,
 'classifier__n_clusters': 2,
 'classifier__n_hidden_features': 10,
 'classifier__nodes_sim': 'sobol',
 'classifier__obj__C': 1.0,
 'classifier__obj__class_weight': None,
 'classifier__obj__dual': False,
 'classifier__obj__fit_intercept': True,
 'classifier__obj__intercept_scaling': 1,
 'classifier__obj__l1_ratio': None,
 'classifier__obj__max_iter': 100,
 'classifier__obj__multi_class': 'auto',
 'classifier__obj__n_jobs': None,
 'classifier__obj__penalty': 'l2',
 'classifier__obj__random_state': 42,
 'classifier__obj__solver': 'lbfgs',
 'classifier__obj__tol': 0.0001,
 'classifier__obj__verbose': 0,
 'classifier__obj__warm_start': False,
 'classifier__obj': LogisticRegression(random_state=42),
 'classifier__row_sample': 1,
 'classifier__seed': 123,
 'classifier__type_clust': 'kmeans',
 'classifier__type_scaling': ('std', 'std', 'std')}

3 – Regression

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

data = load_diabetes()
X = data.data
y= data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .2, random_state = 123)

regr = ns.LazyRegressor(verbose=0, ignore_warnings=True, custom_metric=None)
models, predictions = regr.fit(X_train, X_test, y_train, y_test)
model_dictionary = regr.provide_models(X_train, X_test, y_train, y_test)

100%|██████████| 40/40 [00:03<00:00, 12.38it/s]

display(models)

	Adjusted R-Squared	R-Squared	RMSE	Time Taken
Model
LassoLarsIC	0.53	0.59	51.11	0.03
SGDRegressor	0.53	0.58	51.24	0.03
HuberRegressor	0.53	0.58	51.26	0.05
Ridge	0.53	0.58	51.37	0.03
KernelRidge	0.53	0.58	51.37	0.03
RidgeCV	0.53	0.58	51.37	0.03
Lasso	0.52	0.58	51.52	0.03
LassoLars	0.52	0.58	51.52	0.03
LassoCV	0.52	0.58	51.58	0.12
LassoLarsCV	0.52	0.58	51.58	0.05
TransformedTargetRegressor	0.52	0.58	51.62	0.03
LinearRegression	0.52	0.58	51.62	0.03
OrthogonalMatchingPursuitCV	0.52	0.58	51.69	0.05
BayesianRidge	0.52	0.57	51.77	0.03
LinearSVR	0.51	0.57	52.04	0.02
ElasticNetCV	0.51	0.56	52.49	0.08
LarsCV	0.50	0.56	52.79	0.05
PassiveAggressiveRegressor	0.49	0.55	53.39	0.03
GradientBoostingRegressor	0.48	0.54	54.00	0.26
ElasticNet	0.46	0.52	54.92	0.03
BaggingRegressor	0.46	0.52	54.92	0.07
RandomForestRegressor	0.46	0.52	55.07	0.37
HistGradientBoostingRegressor	0.45	0.51	55.42	0.20
ExtraTreesRegressor	0.44	0.51	55.71	0.24
AdaBoostRegressor	0.44	0.51	55.75	0.14
MLPRegressor	0.43	0.50	56.38	0.45
TweedieRegressor	0.42	0.48	57.03	0.03
RANSACRegressor	0.42	0.48	57.14	0.16
KNeighborsRegressor	0.31	0.39	62.10	0.05
OrthogonalMatchingPursuit	0.31	0.38	62.27	0.04
GaussianProcessRegressor	0.19	0.28	67.13	0.05
ExtraTreeRegressor	0.15	0.24	69.09	0.03
SVR	0.12	0.22	69.98	0.04
NuSVR	0.12	0.22	70.14	0.04
DummyRegressor	-0.13	-0.00	79.39	0.03
DecisionTreeRegressor	-0.26	-0.11	83.75	0.03
Lars	-1.95	-1.61	128.28	0.14

model_dictionary["LassoLarsIC"]

Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('numeric',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer()),
                                                                  ('scaler',
                                                                   StandardScaler())]),
                                                  Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')),
                                                 ('categorical_low',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='constant')),
                                                                  ('encoding',
                                                                   OneHotEncoder(handle_unknown='ignore',
                                                                                 sparse=False))]),
                                                  Int64Index([], dtype='int64')),
                                                 ('categorical_high',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='constant')),
                                                                  ('encoding',
                                                                   OrdinalEncoder())]),
                                                  Int64Index([], dtype='int64'))])),
                ('regressor', CustomRegressor(obj=LassoLarsIC()))])