AutoML in nnetsauce (randomized and quasi-randomized nnetworks)

Posted on October 22, 2023 by T. Moudiki in Data science | 0 Comments

This article was first published on T. Moudiki's Webpage - Python , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Content:

Installing nnetsauce for Python
Classification
Regression

Disclaimer: I have no affiliation with the lazypredict project.

A few days ago, I stumbled accross a cool Python package called lazypredict. Pretty well-designed, working, and relying on scikit-learn’s design.

With lazypredict, you can rapidly have an idea of which scikit-learn model (can also work with xgboost’s and lightgbm’s scikit-learn-like interfaces) performs best on a given data set, with a little preprocessing, and without hyperparameters’ tuning (this is important to note).

I thought something similar could be beneficial to nnetsauce’s classes CustomClassifier, CustomRegressor (see detailed examples below, and interact with the graphs) and MTS. For now.

So far, in nnetsauce (Python version), I adapted the lazy prediction feature to regression (CustomRegressor) and classification (CustomClassifier). Not for univariate and multivariate time series forecasting (MTS) yet. You can try it from a GitHub branch.

2 – Installation

!pip install git+https://github.com/Techtonique/nnetsauce.git@lazy-predict

2 – Classification

2 – 1 Loading the Dataset

import nnetsauce as ns
from sklearn.datasets import load_breast_cancer

data = load_breast_cancer()
X = data.data
y= data.target

2 – 2 Building the classification model using LazyPredict

from sklearn.model_selection import train_test_split

# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y,
                                                    test_size=0.2,
                                                    random_state=123)

# build the lazyclassifier
clf = ns.LazyClassifier(verbose=0, ignore_warnings=True,
                        custom_metric=None,
                        n_hidden_features=10,
                        col_sample=0.9)

# fit it
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

100%|██████████| 27/27 [00:09<00:00,  2.71it/s]

# print the best models
display(models)

	Accuracy	Balanced Accuracy	ROC AUC	F1 Score	Time Taken
Model
LogisticRegression	0.99	0.99	0.99	0.99	0.69
LinearSVC	0.98	0.98	0.98	0.98	0.33
SGDClassifier	0.98	0.98	0.98	0.98	0.19
Perceptron	0.98	0.98	0.98	0.98	0.15
LabelPropagation	0.98	0.98	0.98	0.98	0.33
LabelSpreading	0.98	0.98	0.98	0.98	0.43
SVC	0.98	0.98	0.98	0.98	0.16
RandomForestClassifier	0.98	0.98	0.98	0.98	0.66
ExtraTreesClassifier	0.98	0.98	0.98	0.98	0.40
KNeighborsClassifier	0.98	0.98	0.98	0.98	0.34
DecisionTreeClassifier	0.97	0.97	0.97	0.97	0.53
PassiveAggressiveClassifier	0.97	0.97	0.97	0.97	0.21
LinearDiscriminantAnalysis	0.97	0.96	0.96	0.97	0.19
CalibratedClassifierCV	0.97	0.96	0.96	0.97	0.24
AdaBoostClassifier	0.96	0.96	0.96	0.96	1.31
BaggingClassifier	0.95	0.95	0.95	0.95	0.63
RidgeClassifier	0.96	0.94	0.94	0.96	0.27
RidgeClassifierCV	0.96	0.94	0.94	0.96	0.18
QuadraticDiscriminantAnalysis	0.95	0.94	0.94	0.95	0.81
ExtraTreeClassifier	0.94	0.93	0.93	0.94	0.12
NuSVC	0.94	0.91	0.91	0.94	0.29
GaussianNB	0.93	0.91	0.91	0.93	0.17
BernoulliNB	0.92	0.90	0.90	0.92	0.31
NearestCentroid	0.92	0.89	0.89	0.92	0.24
DummyClassifier	0.64	0.50	0.50	0.50	0.27

model_dictionary = clf.provide_models(X_train, X_test, y_train, y_test)

model_dictionary['LogisticRegression']

Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('numeric',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer()),
                                                                  ('scaler',
                                                                   StandardScaler())]),
                                                  Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
           dtype='int64')),
                                                 ('categorical_low',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='c...
                                                                   OneHotEncoder(handle_unknown='ignore',
                                                                                 sparse=False))]),
                                                  Int64Index([], dtype='int64')),
                                                 ('categorical_high',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='constant')),
                                                                  ('encoding',
                                                                   OrdinalEncoder())]),
                                                  Int64Index([], dtype='int64'))])),
                ('classifier',
                 CustomClassifier(col_sample=0.9, n_hidden_features=10,
                                  obj=LogisticRegression(random_state=42)))])

In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

Pipeline

Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('numeric',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer()),
                                                                  ('scaler',
                                                                   StandardScaler())]),
                                                  Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
           dtype='int64')),
                                                 ('categorical_low',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='c...
                                                                   OneHotEncoder(handle_unknown='ignore',
                                                                                 sparse=False))]),
                                                  Int64Index([], dtype='int64')),
                                                 ('categorical_high',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='constant')),
                                                                  ('encoding',
                                                                   OrdinalEncoder())]),
                                                  Int64Index([], dtype='int64'))])),
                ('classifier',
                 CustomClassifier(col_sample=0.9, n_hidden_features=10,
                                  obj=LogisticRegression(random_state=42)))])

preprocessor: ColumnTransformer

ColumnTransformer(transformers=[('numeric',
                                 Pipeline(steps=[('imputer', SimpleImputer()),
                                                 ('scaler', StandardScaler())]),
                                 Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
           dtype='int64')),
                                ('categorical_low',
                                 Pipeline(steps=[('imputer',
                                                  SimpleImputer(fill_value='missing',
                                                                strategy='constant')),
                                                 ('encoding',
                                                  OneHotEncoder(handle_unknown='ignore',
                                                                sparse=False))]),
                                 Int64Index([], dtype='int64')),
                                ('categorical_high',
                                 Pipeline(steps=[('imputer',
                                                  SimpleImputer(fill_value='missing',
                                                                strategy='constant')),
                                                 ('encoding',
                                                  OrdinalEncoder())]),
                                 Int64Index([], dtype='int64'))])

numeric

Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
            17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
           dtype='int64')

SimpleImputer

SimpleImputer()

StandardScaler

StandardScaler()

categorical_low

Int64Index([], dtype='int64')

SimpleImputer

SimpleImputer(fill_value='missing', strategy='constant')

OneHotEncoder

OneHotEncoder(handle_unknown='ignore', sparse=False)

categorical_high

Int64Index([], dtype='int64')

SimpleImputer

SimpleImputer(fill_value='missing', strategy='constant')

OrdinalEncoder

OrdinalEncoder()

classifier: CustomClassifier

CustomClassifier(col_sample=0.9, n_hidden_features=10,
                 obj=LogisticRegression(random_state=42))

obj: LogisticRegression

LogisticRegression(random_state=42)

LogisticRegression

LogisticRegression(random_state=42)

model_dictionary['LogisticRegression'].get_params()

{'memory': None,
 'steps': [('preprocessor',
   ColumnTransformer(transformers=[('numeric',
                                    Pipeline(steps=[('imputer', SimpleImputer()),
                                                    ('scaler', StandardScaler())]),
                                    Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
               17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
              dtype='int64')),
                                   ('categorical_low',
                                    Pipeline(steps=[('imputer',
                                                     SimpleImputer(fill_value='missing',
                                                                   strategy='constant')),
                                                    ('encoding',
                                                     OneHotEncoder(handle_unknown='ignore',
                                                                   sparse=False))]),
                                    Int64Index([], dtype='int64')),
                                   ('categorical_high',
                                    Pipeline(steps=[('imputer',
                                                     SimpleImputer(fill_value='missing',
                                                                   strategy='constant')),
                                                    ('encoding',
                                                     OrdinalEncoder())]),
                                    Int64Index([], dtype='int64'))])),
  ('classifier',
   CustomClassifier(col_sample=0.9, n_hidden_features=10,
                    obj=LogisticRegression(random_state=42)))],
 'verbose': False,
 'preprocessor': ColumnTransformer(transformers=[('numeric',
                                  Pipeline(steps=[('imputer', SimpleImputer()),
                                                  ('scaler', StandardScaler())]),
                                  Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
             17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
            dtype='int64')),
                                 ('categorical_low',
                                  Pipeline(steps=[('imputer',
                                                   SimpleImputer(fill_value='missing',
                                                                 strategy='constant')),
                                                  ('encoding',
                                                   OneHotEncoder(handle_unknown='ignore',
                                                                 sparse=False))]),
                                  Int64Index([], dtype='int64')),
                                 ('categorical_high',
                                  Pipeline(steps=[('imputer',
                                                   SimpleImputer(fill_value='missing',
                                                                 strategy='constant')),
                                                  ('encoding',
                                                   OrdinalEncoder())]),
                                  Int64Index([], dtype='int64'))]),
 'classifier': CustomClassifier(col_sample=0.9, n_hidden_features=10,
                  obj=LogisticRegression(random_state=42)),
 'preprocessor__n_jobs': None,
 'preprocessor__remainder': 'drop',
 'preprocessor__sparse_threshold': 0.3,
 'preprocessor__transformer_weights': None,
 'preprocessor__transformers': [('numeric',
   Pipeline(steps=[('imputer', SimpleImputer()), ('scaler', StandardScaler())]),
   Int64Index([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16,
               17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
              dtype='int64')),
  ('categorical_low',
   Pipeline(steps=[('imputer',
                    SimpleImputer(fill_value='missing', strategy='constant')),
                   ('encoding',
                    OneHotEncoder(handle_unknown='ignore', sparse=False))]),
   Int64Index([], dtype='int64')),
  ('categorical_high',
   Pipeline(steps=[('imputer',
                    SimpleImputer(fill_value='missing', strategy='constant')),
                   ('encoding', OrdinalEncoder())]),
   Int64Index([], dtype='int64'))],
 'preprocessor__verbose': False,
 'preprocessor__verbose_feature_names_out': True,
 'preprocessor__numeric': Pipeline(steps=[('imputer', SimpleImputer()), ('scaler', StandardScaler())]),
 'preprocessor__categorical_low': Pipeline(steps=[('imputer',
                  SimpleImputer(fill_value='missing', strategy='constant')),
                 ('encoding',
                  OneHotEncoder(handle_unknown='ignore', sparse=False))]),
 'preprocessor__categorical_high': Pipeline(steps=[('imputer',
                  SimpleImputer(fill_value='missing', strategy='constant')),
                 ('encoding', OrdinalEncoder())]),
 'preprocessor__numeric__memory': None,
 'preprocessor__numeric__steps': [('imputer', SimpleImputer()),
  ('scaler', StandardScaler())],
 'preprocessor__numeric__verbose': False,
 'preprocessor__numeric__imputer': SimpleImputer(),
 'preprocessor__numeric__scaler': StandardScaler(),
 'preprocessor__numeric__imputer__add_indicator': False,
 'preprocessor__numeric__imputer__copy': True,
 'preprocessor__numeric__imputer__fill_value': None,
 'preprocessor__numeric__imputer__keep_empty_features': False,
 'preprocessor__numeric__imputer__missing_values': nan,
 'preprocessor__numeric__imputer__strategy': 'mean',
 'preprocessor__numeric__imputer__verbose': 'deprecated',
 'preprocessor__numeric__scaler__copy': True,
 'preprocessor__numeric__scaler__with_mean': True,
 'preprocessor__numeric__scaler__with_std': True,
 'preprocessor__categorical_low__memory': None,
 'preprocessor__categorical_low__steps': [('imputer',
   SimpleImputer(fill_value='missing', strategy='constant')),
  ('encoding', OneHotEncoder(handle_unknown='ignore', sparse=False))],
 'preprocessor__categorical_low__verbose': False,
 'preprocessor__categorical_low__imputer': SimpleImputer(fill_value='missing', strategy='constant'),
 'preprocessor__categorical_low__encoding': OneHotEncoder(handle_unknown='ignore', sparse=False),
 'preprocessor__categorical_low__imputer__add_indicator': False,
 'preprocessor__categorical_low__imputer__copy': True,
 'preprocessor__categorical_low__imputer__fill_value': 'missing',
 'preprocessor__categorical_low__imputer__keep_empty_features': False,
 'preprocessor__categorical_low__imputer__missing_values': nan,
 'preprocessor__categorical_low__imputer__strategy': 'constant',
 'preprocessor__categorical_low__imputer__verbose': 'deprecated',
 'preprocessor__categorical_low__encoding__categories': 'auto',
 'preprocessor__categorical_low__encoding__drop': None,
 'preprocessor__categorical_low__encoding__dtype': numpy.float64,
 'preprocessor__categorical_low__encoding__handle_unknown': 'ignore',
 'preprocessor__categorical_low__encoding__max_categories': None,
 'preprocessor__categorical_low__encoding__min_frequency': None,
 'preprocessor__categorical_low__encoding__sparse': False,
 'preprocessor__categorical_low__encoding__sparse_output': True,
 'preprocessor__categorical_high__memory': None,
 'preprocessor__categorical_high__steps': [('imputer',
   SimpleImputer(fill_value='missing', strategy='constant')),
  ('encoding', OrdinalEncoder())],
 'preprocessor__categorical_high__verbose': False,
 'preprocessor__categorical_high__imputer': SimpleImputer(fill_value='missing', strategy='constant'),
 'preprocessor__categorical_high__encoding': OrdinalEncoder(),
 'preprocessor__categorical_high__imputer__add_indicator': False,
 'preprocessor__categorical_high__imputer__copy': True,
 'preprocessor__categorical_high__imputer__fill_value': 'missing',
 'preprocessor__categorical_high__imputer__keep_empty_features': False,
 'preprocessor__categorical_high__imputer__missing_values': nan,
 'preprocessor__categorical_high__imputer__strategy': 'constant',
 'preprocessor__categorical_high__imputer__verbose': 'deprecated',
 'preprocessor__categorical_high__encoding__categories': 'auto',
 'preprocessor__categorical_high__encoding__dtype': numpy.float64,
 'preprocessor__categorical_high__encoding__encoded_missing_value': nan,
 'preprocessor__categorical_high__encoding__handle_unknown': 'error',
 'preprocessor__categorical_high__encoding__unknown_value': None,
 'classifier__a': 0.01,
 'classifier__activation_name': 'relu',
 'classifier__backend': 'cpu',
 'classifier__bias': True,
 'classifier__cluster_encode': True,
 'classifier__col_sample': 0.9,
 'classifier__direct_link': True,
 'classifier__dropout': 0,
 'classifier__n_clusters': 2,
 'classifier__n_hidden_features': 10,
 'classifier__nodes_sim': 'sobol',
 'classifier__obj__C': 1.0,
 'classifier__obj__class_weight': None,
 'classifier__obj__dual': False,
 'classifier__obj__fit_intercept': True,
 'classifier__obj__intercept_scaling': 1,
 'classifier__obj__l1_ratio': None,
 'classifier__obj__max_iter': 100,
 'classifier__obj__multi_class': 'auto',
 'classifier__obj__n_jobs': None,
 'classifier__obj__penalty': 'l2',
 'classifier__obj__random_state': 42,
 'classifier__obj__solver': 'lbfgs',
 'classifier__obj__tol': 0.0001,
 'classifier__obj__verbose': 0,
 'classifier__obj__warm_start': False,
 'classifier__obj': LogisticRegression(random_state=42),
 'classifier__row_sample': 1,
 'classifier__seed': 123,
 'classifier__type_clust': 'kmeans',
 'classifier__type_scaling': ('std', 'std', 'std')}

3 – Regression

from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split

data = load_diabetes()
X = data.data
y= data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = .2, random_state = 123)

regr = ns.LazyRegressor(verbose=0, ignore_warnings=True, custom_metric=None)
models, predictions = regr.fit(X_train, X_test, y_train, y_test)
model_dictionary = regr.provide_models(X_train, X_test, y_train, y_test)

100%|██████████| 40/40 [00:03<00:00, 12.38it/s]

display(models)

	Adjusted R-Squared	R-Squared	RMSE	Time Taken
Model
LassoLarsIC	0.53	0.59	51.11	0.03
SGDRegressor	0.53	0.58	51.24	0.03
HuberRegressor	0.53	0.58	51.26	0.05
Ridge	0.53	0.58	51.37	0.03
KernelRidge	0.53	0.58	51.37	0.03
RidgeCV	0.53	0.58	51.37	0.03
Lasso	0.52	0.58	51.52	0.03
LassoLars	0.52	0.58	51.52	0.03
LassoCV	0.52	0.58	51.58	0.12
LassoLarsCV	0.52	0.58	51.58	0.05
TransformedTargetRegressor	0.52	0.58	51.62	0.03
LinearRegression	0.52	0.58	51.62	0.03
OrthogonalMatchingPursuitCV	0.52	0.58	51.69	0.05
BayesianRidge	0.52	0.57	51.77	0.03
LinearSVR	0.51	0.57	52.04	0.02
ElasticNetCV	0.51	0.56	52.49	0.08
LarsCV	0.50	0.56	52.79	0.05
PassiveAggressiveRegressor	0.49	0.55	53.39	0.03
GradientBoostingRegressor	0.48	0.54	54.00	0.26
ElasticNet	0.46	0.52	54.92	0.03
BaggingRegressor	0.46	0.52	54.92	0.07
RandomForestRegressor	0.46	0.52	55.07	0.37
HistGradientBoostingRegressor	0.45	0.51	55.42	0.20
ExtraTreesRegressor	0.44	0.51	55.71	0.24
AdaBoostRegressor	0.44	0.51	55.75	0.14
MLPRegressor	0.43	0.50	56.38	0.45
TweedieRegressor	0.42	0.48	57.03	0.03
RANSACRegressor	0.42	0.48	57.14	0.16
KNeighborsRegressor	0.31	0.39	62.10	0.05
OrthogonalMatchingPursuit	0.31	0.38	62.27	0.04
GaussianProcessRegressor	0.19	0.28	67.13	0.05
ExtraTreeRegressor	0.15	0.24	69.09	0.03
SVR	0.12	0.22	69.98	0.04
NuSVR	0.12	0.22	70.14	0.04
DummyRegressor	-0.13	-0.00	79.39	0.03
DecisionTreeRegressor	-0.26	-0.11	83.75	0.03
Lars	-1.95	-1.61	128.28	0.14

model_dictionary["LassoLarsIC"]

Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('numeric',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer()),
                                                                  ('scaler',
                                                                   StandardScaler())]),
                                                  Int64Index([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype='int64')),
                                                 ('categorical_low',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='constant')),
                                                                  ('encoding',
                                                                   OneHotEncoder(handle_unknown='ignore',
                                                                                 sparse=False))]),
                                                  Int64Index([], dtype='int64')),
                                                 ('categorical_high',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='constant')),
                                                                  ('encoding',
                                                                   OrdinalEncoder())]),
                                                  Int64Index([], dtype='int64'))])),
                ('regressor', CustomRegressor(obj=LassoLarsIC()))])

Python-bloggers

Data science news and tutorials - contributed by Python bloggers

AutoML in nnetsauce (randomized and quasi-randomized nnetworks)

2 – Installation

2 – Classification

2 – 1 Loading the Dataset

2 – 2 Building the classification model using LazyPredict

3 – Regression

Related