Automated Machine Learning Model Testing

[This article was first published on Python – Predictive Hacks, and kindly contributed to python-bloggers]. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

We have all been in this situation that we didn’t know which model is optimum for our ML project and most likely we were trying and evaluating many ML models just to see their behavior in our data. However, this is not a simple task and requires time and effort.

Fortunately, we can do this with only a few lines of code using LazyPredict. it will run more than 20 different ML models and return their performance statistics.

Installation

pip install lazypredict

Example

Let’s see an example using the Titanic dataset from Kaggle.

import pandas as pd
import numpy as np
from lazypredict.Supervised import LazyClassifier, LazyRegressor
from sklearn.model_selection import train_test_split

data=pd.read_csv('train.csv')

data.head()
machine learning model testing

Here, we will try to predict if a passenger survived the Titanic so we have a classification problem.

Lazypredict can also do basic data preprocessing like fill NA values, create dummy variables, etc. That means that we can test the models immediately after reading the data and without getting any errors. However, we can use our preprocessed data so the model testing will be more accurate as it will be closer to our final models.

For this example, we will not do any preprocessing and let the Lazypredict do all the work.

#we are selecting the following columns as features for our models
X=data[['Pclass', 'Sex', 'Age', 'SibSp',
       'Parch', 'Fare', 'Embarked']]

y=data['Survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,random_state=7)

# Fit LazyRegressor
reg = LazyClassifier(ignore_warnings=True, random_state=7, verbose=False)

#we have to pass the train and test dataset so it can evaluate the models
models, predictions = reg.fit(X_train, X_test, y_train, y_test)  # pass all sets

models
machine learning model testing

As you can see, it will return a data frame that contains the models and their statistics. We can see that Tree-Based models are performing better than the others. Knowing this, we can use Tree-based models in our approach.

You can get the complete pipeline and the models parameters used from Lazypredict as follows.

#we will get the pipeline of LGBMClassifier
reg.models['LGBMClassifier']
Pipeline(steps=[('preprocessor',
                 ColumnTransformer(transformers=[('numeric',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer()),
                                                                  ('scaler',
                                                                   StandardScaler())]),
                                                  Index(['Pclass', 'Age', 'SibSp', 'Parch', 'Fare'], dtype='object')),
                                                 ('categorical_low',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='constant')),
                                                                  ('encoding',
                                                                   OneHotEncoder(handle_unknown='ignore',
                                                                                 sparse=False))]),
                                                  Index(['Sex', 'Embarked'], dtype='object')),
                                                 ('categorical_high',
                                                  Pipeline(steps=[('imputer',
                                                                   SimpleImputer(fill_value='missing',
                                                                                 strategy='constant')),
                                                                  ('encoding',
                                                                   OrdinalEncoder())]),
                                                  Index([], dtype='object'))])),
                ('classifier', LGBMClassifier(random_state=7))])

Also, you can use the complete model pipeline for prediction.

reg.models['LGBMClassifier'].predict(X_test)
array([0, 0, 1, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0,
       0, 0, 0, 1, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 1, 0, 1, 0, 1, 0,
       0, 1, 0, 1, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 0, 0, 0, 1, 0, 0, 0,
       0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 1, 0, 0,
       0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0,
       1, 1, 0, 0, 1, 1, 0, 0, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1,
       0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
       0, 0, 1], dtype=int64)

In the same way as LazyClassifier, we can use LazyRegressor to test models for regression problems.

Summing it up

Lazypredicty can help us have a basic understanding of which model is performing better in our data. It can be run nearly without any data preprocessing so we can test models immediately after reading the data.

It is worth noting that there are more ways to do automated machine learning model testing like using auto-sklearn but it is very complex to install it, especially in windows.

To leave a comment for the author, please follow the link and comment on their blog: Python – Predictive Hacks.

Want to share your content on python-bloggers? click here.