Want to share your content on python-bloggers? click here.
Content:
- Installing nnetsauce for Python
- Classification
- Regression
Disclaimer: I have no affiliation with the lazypredict project.
A few days ago, I stumbled accross a cool Python package called lazypredict. Pretty well-designed, working, and relying on scikit-learn’s design.
With lazypredict, you can rapidly have an idea of which scikit-learn model (can also work with xgboost’s and lightgbm’s scikit-learn-like interfaces) performs best on a given data set, with a little preprocessing, and without hyperparameters’ tuning (this is important to note).
I thought something similar could be beneficial to nnetsauce’s classes CustomClassifier, CustomRegressor (see detailed examples below, and interact with the graphs) and MTS. For now.
So far, in nnetsauce (Python version), I adapted the lazy prediction feature to regression (CustomRegressor) and classification (CustomClassifier). Not for univariate and multivariate time series forecasting (MTS) yet. You can try it from a GitHub branch.
2 – Installation
!pip install git+https://github.com/Techtonique/nnetsauce.git@lazy-predict
2 – Classification
2 – 1 Loading the Dataset
import nnetsauce as ns from sklearn.datasets import load_breast_cancer data = load_breast_cancer() X = data.data y= data.target
2 – 2 Building the classification model using LazyPredict
from sklearn.model_selection import train_test_split
# split the data
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2,
random_state=123)
# build the lazyclassifier
clf = ns.LazyClassifier(verbose=0, ignore_warnings=True,
custom_metric=None,
n_hidden_features=10,
col_sample=0.9)
# fit it
models, predictions = clf.fit(X_train, X_test, y_train, y_test)
100%|██████████| 27/27 [00:09<00:00, 2.71it/s]
# print the best models display(models)
| Accuracy | Balanced Accuracy | ROC AUC | F1 Score | Time Taken | |
|---|---|---|---|---|---|
| Model | |||||
| LogisticRegression | 0.99 | 0.99 | 0.99 | 0.99 | 0.69 |
| LinearSVC | 0.98 | 0.98 | 0.98 | 0.98 | 0.33 |
| SGDClassifier | 0.98 | 0.98 | 0.98 | 0.98 | 0.19 |
| Perceptron | 0.98 | 0.98 | 0.98 | 0.98 | 0.15 |
| LabelPropagation | 0.98 | 0.98 | 0.98 | 0.98 | 0.33 |
| LabelSpreading | 0.98 | 0.98 | 0.98 | 0.98 | 0.43 |
| SVC | 0.98 | 0.98 | 0.98 | 0.98 | 0.16 |
| RandomForestClassifier | 0.98 | 0.98 | 0.98 | 0.98 | 0.66 |
| ExtraTreesClassifier | 0.98 | 0.98 | 0.98 | 0.98 | 0.40 |
| KNeighborsClassifier | 0.98 | 0.98 | 0.98 | 0.98 | 0.34 |
| DecisionTreeClassifier | 0.97 | 0.97 | 0.97 | 0.97 | 0.53 |
| PassiveAggressiveClassifier | 0.97 | 0.97 | 0.97 | 0.97 | 0.21 |
| LinearDiscriminantAnalysis | 0.97 | 0.96 | 0.96 | 0.97 | 0.19 |
| CalibratedClassifierCV | 0.97 | 0.96 | 0.96 | 0.97 | 0.24 |
| AdaBoostClassifier | 0.96 | 0.96 | 0.96 | 0.96 | 1.31 |
| BaggingClassifier | 0.95 | 0.95 | 0.95 | 0.95 | 0.63 |
| RidgeClassifier | 0.96 | 0.94 | 0.94 | 0.96 | 0.27 |
| RidgeClassifierCV | 0.96 | 0.94 | 0.94 | 0.96 | 0.18 |
| QuadraticDiscriminantAnalysis | 0.95 | 0.94 | 0.94 | 0.95 | 0.81 |
| ExtraTreeClassifier | 0.94 | 0.93 | 0.93 | 0.94 | 0.12 |
| NuSVC | 0.94 | 0.91 | 0.91 | 0.94 | 0.29 |
| GaussianNB | 0.93 | 0.91 | 0.91 | 0.93 | 0.17 |
| BernoulliNB | 0.92 | 0.90 | 0.90 | 0.92 | 0.31 |
| NearestCentroid | 0.92 | 0.89 | 0.89 | 0.92 | 0.24 |
| DummyClassifier | 0.64 | 0.50 | 0.50 | 0.50 | 0.27 |
model_dictionary = clf.provide_models(X_train, X_test, y_train, y_test)
model_dictionary['LogisticRegression']
Pipeline(steps=[('preprocessor',
ColumnTransformer(transformers=[('numeric',
Pipeline(steps=[('imputer',
SimpleImputer()),
('scaler',
StandardScaler())]),
Int64Index([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,
17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29],
dtype='int64')),
('categorical_low',
Pipeline(steps=[('imputer',
SimpleImputer(fill_value='missing',
strategy='c...
OneHotEncoder(handle_unknown='ignore',
sparse=False))]),
Int64Index([], dtype='int64')),
('categorical_high',
Pipeline(steps=[('imputer',
SimpleImputer(fill_value='missing',
strategy='constant')),
('encoding',
OrdinalEncoder())]),
Int64Index([], dtype='int64'))])),
('classifier',
CustomClassifier(col_sample=0.9, n_hidden_features=10,
obj=LogisticRegression(random_state=42)))])
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.