nnetsauce’s introduction as of 2024-02-11 (new version 0.17.0)

T. Moudiki

2 years ago

This article was first published on T. Moudiki's Webpage - Python , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

0 – What is `nnetsauce`?

nnetsauce is a general-purpose computational tool for
statistical/Machine Learning (ML). As of 2024-02-11, nnetsauce is
available for R and Python. It contains various implementations of –
shallow and deep – regression, classification, univariate and
multivariate probabilistic time series forecasting models.

The specificity of nnetsauce resides on the fact that the great
majority of these ML models are powered by Quasi-Randomized Networks
(QRNNs, see Moudiki, Planchet, and Cousin (2018) and Moudiki
(2019-2024)). Indeed, whereas traditional neural networks (NNs, see
Goodfellow, Bengio, and Courville (2016)) are trained by using
gradient-based optimization algorithms, QRNNs do actually engineer
features by creating new, randomized or deterministic ones.

The package lives in and relies on a well-supported and documented Python
ecosystem; the one introduced by Pedregosa et al. (2011):

ML models are trained by using a method called fit
predictions on unseen data are made by a method called predict
thanks to object-oriented programming’s inheritance, the models are
a 100% compatible with scikit-learn’s Pipelines,
GridSearchCVs, RandomizedSearchCVs, and cross_val_scores
(tools for model calibration).

Changes in version 0.17.0 (2024-02-11) include:

Attribute estimators (a list of Estimator’s as strings) for LazyClassifier,
LazyRegressor, LazyDeepClassifier, LazyDeepRegressor, LazyMTS,
and LazyDeepMTS
New documentation for the package, using pdoc
(highly customizable, based on the excellent Jinja2, HTML and CSS):

https://techtonique.github.io/nnetsauce/

Remove external regressors xreg at inference time for time series forecasting classes MTS and
DeepMTS
New class Downloader: querying the R universe API for
datasets (see
https://thierrymoudiki.github.io/blog/2023/12/25/python/r/misc/mlsauce/runiverse-api2
for similar example in mlsauce)
Add custom metrics to Lazy*
Rename Deep regressors and classifiers to Deep* in Lazy*
Add attribute sort_by to Lazy* – sort the data frame output by a given
metric
Add attribute classes_ to classifiers (ensure consistency
with scikit-learn)

The best way to illustrate the use of nnetsauce is to showcase its
integrated automated machine learning (AutoML) functionalities for
solving a real-world problem. This will be done by applying a
classification task on the Wisconsin breast cancer data set available in
scikit-learn (Pedregosa et al. (2011)). The same functionality is also
available for regression and time series forecasting tasks.

Contents

1 – Installing `nnetsauce` for Python

There are three ways to install nnetsauce for Python:

1st method: from PyPI by using pip at the command line (stable
version):

pip install nnetsauce

2nd method: using conda (Linux and macOS only for now)

conda install -c conda-forge nnetsauce

3rd method: from Github (development version)

pip install git+https://github.com/Techtonique/nnetsauce.git

git clone https://github.com/Techtonique/nnetsauce.git
cd nnetsauce
make install

One way to run all the examples available in the repository after
cloning it (as done in this 3rd installation option) is to navigate into
it and run:

cd nnetsauce
make install
make run-examples

There are also:

Several Jupyter notebooks available in: https://github.com/Techtonique/nnetsauce/tree/master/nnetsauce/demo
Blog posts available at: https://thierrymoudiki.github.io/blog#QuasiRandomizedNN

2 – Automated Machine Learning (AutoML) with `nnetsauce` classifiers on Wisconsin breast cancer data set

Import the packages required for the examples:

import nnetsauce as ns
import matplotlib.pyplot as plt
import numpy as np
from sklearn.datasets import  load_breast_cancer
from sklearn.ensemble import ExtraTreesRegressor, RandomForestRegressor
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from time import time

Run AutoML with nnetsauce classifiers on Wisconsin breast cancer
(identify and benign or malignant) toy data set:

# load the whole data set 
dataset = load_breast_cancer()
X = dataset.data
y = dataset.target

# splitting data in a training set (80%) and a test set (20%)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, 
                                                    random_state=123)

Build the AutoML classifier

There are more than 90 classifiers available in nnetsauce as of
2024-02-11. The LazyClassifier is a meta-estimator that fits a large
number of classifiers of different types, and provides the best model.
It is a good starting point for solving a classification problem.

# build the AutoML classifier
clf = ns.LazyClassifier(verbose=0, ignore_warnings=True, custom_metric=None, 
n_hidden_features=10, col_sample=0.9)

# adjust on training and evaluate on test data
models, predictions = clf.fit(X_train, X_test, y_train, y_test)

  0%|          | 0/92 [00:00<?, ?it/s]
  1%|1         | 1/92 [00:00<01:06,  1.38it/s]
  2%|2         | 2/92 [00:00<00:37,  2.41it/s]
  4%|4         | 4/92 [00:01<00:18,  4.72it/s]
  8%|7         | 7/92 [00:01<00:09,  8.75it/s]
 10%|9         | 9/92 [00:01<00:10,  7.74it/s]
 12%|#1        | 11/92 [00:01<00:08,  9.47it/s]
 15%|#5        | 14/92 [00:01<00:06, 11.41it/s]
 18%|#8        | 17/92 [00:02<00:07,  9.84it/s]
 21%|##        | 19/92 [00:02<00:12,  5.75it/s]
 24%|##3       | 22/92 [00:03<00:08,  7.90it/s]
 26%|##6       | 24/92 [00:03<00:10,  6.36it/s]
 28%|##8       | 26/92 [00:03<00:08,  7.62it/s]
 30%|###       | 28/92 [00:03<00:07,  9.11it/s]
 33%|###2      | 30/92 [00:04<00:14,  4.14it/s]
 35%|###4      | 32/92 [00:05<00:12,  4.91it/s]
 37%|###6      | 34/92 [00:05<00:09,  5.92it/s]
 39%|###9      | 36/92 [00:06<00:18,  3.06it/s]
 40%|####      | 37/92 [00:07<00:21,  2.55it/s]
 42%|####2     | 39/92 [00:07<00:16,  3.24it/s]
 45%|####4     | 41/92 [00:08<00:13,  3.65it/s]
 46%|####5     | 42/92 [00:08<00:14,  3.45it/s]
 47%|####6     | 43/92 [00:08<00:12,  3.93it/s]
 48%|####7     | 44/92 [00:10<00:24,  1.96it/s]
 49%|####8     | 45/92 [00:10<00:19,  2.39it/s]
 50%|#####     | 46/92 [00:10<00:18,  2.55it/s]
 51%|#####1    | 47/92 [00:10<00:14,  3.07it/s]
 52%|#####2    | 48/92 [00:10<00:11,  3.79it/s]
 53%|#####3    | 49/92 [00:11<00:12,  3.58it/s]
 54%|#####4    | 50/92 [00:12<00:22,  1.90it/s]
 57%|#####6    | 52/92 [00:12<00:12,  3.11it/s]
 58%|#####7    | 53/92 [00:12<00:11,  3.49it/s]
 59%|#####8    | 54/92 [00:12<00:11,  3.44it/s]
 60%|#####9    | 55/92 [00:14<00:21,  1.69it/s]
 62%|######1   | 57/92 [00:14<00:12,  2.77it/s]
 64%|######4   | 59/92 [00:14<00:08,  4.09it/s]
 66%|######6   | 61/92 [00:14<00:07,  4.06it/s]
 67%|######7   | 62/92 [00:15<00:07,  3.82it/s]
 70%|######9   | 64/92 [00:15<00:05,  5.32it/s]
 73%|#######2  | 67/92 [00:16<00:04,  5.08it/s]
 75%|#######5  | 69/92 [00:16<00:04,  4.62it/s]
 77%|#######7  | 71/92 [00:16<00:03,  5.78it/s]
 80%|########  | 74/92 [00:16<00:02,  7.62it/s]
 83%|########2 | 76/92 [00:17<00:03,  4.52it/s]
 85%|########4 | 78/92 [00:18<00:02,  5.09it/s]
 87%|########6 | 80/92 [00:18<00:01,  6.28it/s]
 89%|########9 | 82/92 [00:19<00:03,  3.12it/s]
 92%|#########2| 85/92 [00:19<00:01,  4.35it/s]
 93%|#########3| 86/92 [00:20<00:01,  4.13it/s]
 95%|#########4| 87/92 [00:21<00:02,  2.35it/s]
100%|##########| 92/92 [00:21<00:00,  4.27it/s]

Print the models’ leaderboard

print(models)

                                                Accuracy  ...  Time Taken
Model                                                     ...            
SimpleMultitaskClassifier(ExtraTreesRegressor)      0.99  ...        0.50
CustomClassifier(RandomForestClassifier)            0.99  ...        0.43
CustomClassifier(MLPClassifier)                     0.99  ...        0.73
CustomClassifier(Perceptron)                        0.99  ...        0.04
CustomClassifier(LogisticRegression)                0.99  ...        0.05
...                                                  ...  ...         ...
MultitaskClassifier(DummyRegressor)                 0.64  ...        0.08
SimpleMultitaskClassifier(LassoLars)                0.64  ...        0.05
SimpleMultitaskClassifier(Lasso)                    0.64  ...        0.03
SimpleMultitaskClassifier(Lars)                     0.41  ...        0.05
MultitaskClassifier(Lars)                           0.39  ...        0.31

[89 rows x 5 columns]

Provide the best model

model_dictionary = clf.provide_models(X_train, X_test, y_train, y_test)
fit_obj = model_dictionary['SimpleMultitaskClassifier(ExtraTreesRegressor)']
fit_obj

CustomClassifier(col_sample=0.9, n_hidden_features=10,
                 obj=SimpleMultitaskClassifier(obj=ExtraTreesRegressor()))

Classification report

# Classification report
start = time()
preds = fit_obj.predict(X_test)
print(f"Elapsed {time() - start}")

Elapsed 0.04414701461791992              precision    recall  f1-score   support

           0       0.98      1.00      0.99        40
           1       1.00      0.99      0.99        74

    accuracy                           0.99       114
   macro avg       0.99      0.99      0.99       114
weighted avg       0.99      0.99      0.99       114

Confusion matrix

metrics.ConfusionMatrixDisplay.from_estimator(fit_obj, X_test, y_test)

<sklearn.metrics._plot.confusion_matrix.ConfusionMatrixDisplay object at 0x12ccf3050>

dependencies

python3 -m pip freeze

anyio==4.2.0
appnope==0.1.3
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.3.0
asttokens==2.4.1
async-lru==2.0.4
attrs==23.2.0
Babel==2.14.0
beautifulsoup4==4.12.2
binaryornot==0.4.4
bleach==6.1.0
certifi==2023.11.17
cffi==1.16.0
chardet==5.2.0
charset-normalizer==3.3.2
click==8.1.7
comm==0.2.0
contourpy==1.2.0
cookiecutter==2.5.0
cycler==0.12.1
debugpy==1.8.0
decorator==5.1.1
defusedxml==0.7.1
distlib==0.3.8
docutils==0.20.1
executing==2.0.1
fastjsonschema==2.19.1
filelock==3.13.1
fonttools==4.47.2
fqdn==1.5.1
fsspec==2023.12.2
idna==3.6
importlib-metadata==7.0.1
ipykernel==6.28.0
ipython==8.19.0
ipywidgets==8.1.1
isoduration==20.11.0
jaraco.classes==3.3.0
jax==0.4.23
jaxlib==0.4.23
jedi==0.19.1
Jinja2==3.1.2
joblib==1.3.2
json5==0.9.14
jsonpointer==2.4
jsonschema==4.20.0
jsonschema-specifications==2023.12.1
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.9.0
jupyter-lsp==2.2.1
jupyter_client==8.6.0
jupyter_core==5.5.1
jupyter_server==2.12.1
jupyter_server_terminals==0.5.1
jupyterlab==4.0.10
jupyterlab-widgets==3.0.9
jupyterlab_pygments==0.3.0
jupyterlab_server==2.25.2
keyring==24.3.0
kiwisolver==1.4.5
liac-arff==2.5.0
markdown-it-py==3.0.0
MarkupSafe==2.1.3
matplotlib==3.8.2
matplotlib-inline==0.1.6
mdurl==0.1.2
minio==7.2.3
mistune==3.0.2
ml-dtypes==0.3.2
more-itertools==10.1.0
mpmath==1.3.0
nbclient==0.9.0
nbconvert==7.13.1
nbformat==5.9.2
nest-asyncio==1.5.8
networkx==3.2.1
nh3==0.2.15
nnetsauce @ file:///Users/t/Documents/Python_Packages/nnetsauce
notebook==7.0.6
notebook_shim==0.2.3
numpy==1.26.3
openml==0.14.2
opt-einsum==3.3.0
overrides==7.4.0
packaging==23.2
pandas==2.2.0
pandocfilters==1.5.0
parso==0.8.3
patsy==0.5.6
pexpect==4.9.0
pillow==10.2.0
pkginfo==1.9.6
platformdirs==4.1.0
prometheus-client==0.19.0
prompt-toolkit==3.0.43
psutil==5.9.7
ptyprocess==0.7.0
pure-eval==0.2.2
pyarrow==15.0.0
pycparser==2.21
pycryptodome==3.20.0
Pygments==2.17.2
pyparsing==3.1.1
python-dateutil==2.8.2
python-json-logger==2.0.7
python-slugify==8.0.1
pytz==2023.4
PyYAML==6.0.1
pyzmq==25.1.2
qtconsole==5.5.1
QtPy==2.4.1
readme-renderer==42.0
referencing==0.32.0
requests==2.31.0
requests-toolbelt==1.0.0
rfc3339-validator==0.1.4
rfc3986==2.0.0
rfc3986-validator==0.1.1
rich==13.7.0
rpds-py==0.16.2
scikit-learn==1.4.0
scipy==1.12.0
Send2Trash==1.8.2
six==1.16.0
sniffio==1.3.0
soupsieve==2.5
stack-data==0.6.3
statsmodels==0.14.1
sympy==1.12
tabpfn==0.1.9
terminado==0.18.0
text-unidecode==1.3
threadpoolctl==3.2.0
tinycss2==1.2.1
torch==2.2.0
tornado==6.4
tqdm==4.66.1
traitlets==5.14.0
twine==4.0.2
types-python-dateutil==2.8.19.14
typing_extensions==4.9.0
tzdata==2023.4
uri-template==1.3.0
urllib3==2.1.0
virtualenv==20.25.0
wcwidth==0.2.12
webcolors==1.13
webencodings==0.5.1
websocket-client==1.7.0
widgetsnbextension==4.0.9
xmltodict==0.13.0
zipp==3.17.0

sessionInfo()

R version 4.3.2 (2023-10-31)
Platform: x86_64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.2

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Paris
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

loaded via a namespace (and not attached):
 [1] digest_0.6.34     fastmap_1.1.1     xfun_0.41         Matrix_1.6-1.1   
 [5] lattice_0.21-9    reticulate_1.34.0 knitr_1.45        htmltools_0.5.7  
 [9] png_0.1-8         rmarkdown_2.25    cli_3.6.2         grid_4.3.2       
[13] compiler_4.3.2    rprojroot_2.0.4   here_1.0.1        rstudioapi_0.15.0
[17] tools_4.3.2       evaluate_0.23     Rcpp_1.0.12       yaml_2.3.8       
[21] rlang_1.1.3       jsonlite_1.8.8

Goodfellow, Ian, Yoshua Bengio, and Aaron Courville. 2016. Deep
Learning. MIT press.

Moudiki, T. 2019-2024. “Nnetsauce, A Package for Statistical/Machine
Learning Using Randomized and Quasi-Randomized (Neural) Networks.”
https://github.com/thierrymoudiki/nnetsauce.

Moudiki, T., Frédéric Planchet, and Areski Cousin. 2018. “Multiple Time
Series Forecasting Using Quasi-Randomized Functional Link Neural
Networks.” Risks 6 (1): 22.

Pedregosa, Fabian, Gaël Varoquaux, Alexandre Gramfort, Vincent Michel,
Bertrand Thirion, Olivier Grisel, Mathieu Blondel, et al. 2011.
“Scikit-Learn: Machine Learning in Python.” The Journal of Machine
Learning Research 12: 2825–30.

To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - Python .

Want to share your content on python-bloggers? click here.

0 – What is nnetsauce?

1 – Installing nnetsauce for Python

2 – Automated Machine Learning (AutoML) with nnetsauce classifiers on Wisconsin breast cancer data set

dependencies

0 – What is `nnetsauce`?

1 – Installing `nnetsauce` for Python

2 – Automated Machine Learning (AutoML) with `nnetsauce` classifiers on Wisconsin breast cancer data set