Combining Algorithms for Classification with Python

This article was first published on python – educational research techniques , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

Many approaches in machine learning involve making many models that combine their strength and weaknesses to make more accuracy classification. Generally, when this is done it is the same algorithm being used. For example, random forest is simply many decision trees being developed. Even when bagging or boosting is being used it is the same algorithm but with variances in sampling and the use of features.
Combining Algorithms for Classification with Python
In addition to this common form of ensemble learning there is also a way to combine different algorithms to make predictions. For one way of doing this is through a technique called stacking in which the predictions of several models are passed to a higher model that uses the individual model predictions to make a final prediction. In this post we will look at how to do this using Python.


This blog usually tries to explain as much  as possible about what is happening. However, due to the complexity of this topic there are several assumptions about the reader’s background.

  • Already familiar with python
  • Can use various algorithms to make predictions (logistic regression, linear discriminant analysis, decision trees, K nearest neighbors)
  • Familiar with cross-validation and hyperparameter tuning

We will be using the Mroz dataset in the pydataset module. Our goal is to use several of the independent variables to predict whether someone lives in the city or not.

The steps we will take in this post are as follows

  1. Data preparation
  2. Individual model development
  3. Ensemble model development
  4. Hyperparameter tuning of ensemble model
  5. Ensemble model testing

Below is all of the libraries we will be using in this post

import pandas as pd
from sklearn.model_selection import GridSearchCV
from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import LabelEncoder
from pydataset import data
from sklearn.model_selection import train_test_split
from sklearn.model_selection import cross_val_score
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.neighbors import KNeighborsClassifier
from mlxtend.classifier import EnsembleVoteClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis as LDA
from sklearn.metrics import accuracy_score
from sklearn.metrics import classification_report

Data Preparation

We need to perform the following steps for the data preparation

  1. Load the data
  2. Select the independent variables to be used in the analysis
  3. Scale the independent variables
  4. Convert the dependent variable from text to numbers
  5. Split the data in train and test sets

Not all of the variables in the Mroz dataset were used. Some were left out because they were highly correlated with others. This analysis is not in this post but you can explore this on your own. The data was also scaled because many algorithms are sensitive to this so it is best practice to always scale the data. We will use the StandardScaler function for this. Lastly, the dpeendent variable currently consist of values of “yes” and “no” these need to be convert to numbers 1 and 0. We will use the LabelEncoder function for this. The code for all of this is below.

X=pd.DataFrame(X_scale, index=X.index, columns=X.columns)
X_train, X_test,y_train, y_test=train_test_split(X,y,test_size=.3,random_state=5)

We can now proceed to individul model development

Individual Model Development

Below are the steps for this part of the analysis

  1. Instantiate an instance of each algorithm
  2. Check accuracy of each model
  3. Check roc curve of each model

We will create four different models, and they are logistic regression, decision tree, k nearest neighbor, and linear discriminant analysis. We will also set some initial values for the hyperparameters for each. Below is the code

logclf=LogisticRegression(penalty='l2',C=0.001, random_state=0)

We can now assess the accuracy and roc curve of each model. This will be done through using two separate for loops. The first will have the accuracy results and the second will have the roc curve results. The results will also use k-fold cross validation with the cross_val_score function. Below is the code with the results.

clf_labels=['Logistic Regression','Decision Tree','KNN','LDAclf']
for clf, label in zip ([logclf,treeclf,knnclf,LDAclf],clf_labels):
    print("accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(),scores.std(),label))

for clf, label in zip ([logclf,treeclf,knnclf,LDAclf],clf_labels):
    print("roc auc: %0.2f (+/- %0.2f) [%s]" % (scores.mean(),scores.std(),label))

accuracy: 0.69 (+/- 0.04) [Logistic Regression]
accuracy: 0.72 (+/- 0.06) [Decision Tree]
accuracy: 0.66 (+/- 0.06) [KNN]
accuracy: 0.70 (+/- 0.05) [LDAclf]
roc auc: 0.71 (+/- 0.08) [Logistic Regression]
roc auc: 0.70 (+/- 0.07) [Decision Tree]
roc auc: 0.62 (+/- 0.10) [KNN]
roc auc: 0.70 (+/- 0.08) [LDAclf]

The results can speak for themselves. We have a general accuracy of around 70% but our roc auc is poor. Despite this we will now move to the ensemble model development.

Ensemble Model Development

The ensemble model requires the use of the EnsembleVoteClassifier function. Inside this function are the four models we made earlier. Other than this the rest of the code is the same as the previous step. We will assess the accuracy and the roc auc. Below is the code and the results

 mv_clf= EnsembleVoteClassifier(clfs=[logclf,treeclf,knnclf,LDAclf],weights=[1.5,1,1,1])

for clf, label in zip ([logclf,treeclf,knnclf,LDAclf,mv_clf],labels):
    print("accuracy: %0.2f (+/- %0.2f) [%s]" % (scores.mean(),scores.std(),label))

for clf, label in zip ([logclf,treeclf,knnclf,LDAclf,mv_clf],labels):
    print("roc auc: %0.2f (+/- %0.2f) [%s]" % (scores.mean(),scores.std(),label))

accuracy: 0.69 (+/- 0.04) [LR]
accuracy: 0.72 (+/- 0.06) [tree]
accuracy: 0.66 (+/- 0.06) [knn]
accuracy: 0.70 (+/- 0.05) [LDA]
accuracy: 0.70 (+/- 0.04) [combine]
roc auc: 0.71 (+/- 0.08) [LR]
roc auc: 0.70 (+/- 0.07) [tree]
roc auc: 0.62 (+/- 0.10) [knn]
roc auc: 0.70 (+/- 0.08) [LDA]
roc auc: 0.72 (+/- 0.09) [combine]

You can see that the combine model as similar performance to the individual models. This means in this situation that the ensemble learning did not make much of a difference. However, we have not tuned are hyperparameters yet. This will be done in the next step.

Hyperparameter Tuning of Ensemble Model

We are going to tune the decision tree, logistic regression, and KNN model. There are many different hyperparameters we can tune. For demonstration purposes we are only tuning one hyperparameter per algorithm. Once we set the hyperparameters we will run the model and pull the best hyperparameters values based on the roc auc as the metric. Below is the code and the output.



{'decisiontreeclassifier__max_depth': 3,
 'kneighborsclassifier__n_neighbors': 9,
 'logisticregression__C': 10}

Out[35]: 0.7196051482279385

The best values are as follows

  • Decision tree max depth set to 3
  • KNN number of neighbors set to 9
  • logistic regression C set to 10

These values give us a roc auc of 0.72 which is still poor . We can now use these values when we test our final model.

Ensemble Model Testing

The following steps are performed in the analysis

  1. Created new instances of the algorithms with the adjusted hyperparameters
  2. Run the ensemble model
  3. Predict with the test data
  4. Check the results

Below is the first step

logclf=LogisticRegression(penalty='l2',C=10, random_state=0)

Below is step two

mv_clf= EnsembleVoteClassifier(clfs=[logclf,treeclf,knnclf,LDAclf],weights=[1.5,1,1,1]),y_train)

Below are steps 3 and four


col_0   0    1
0      29   58
1      12  127
             precision    recall  f1-score   support

          0       0.71      0.33      0.45        87
          1       0.69      0.91      0.78       139

avg / total       0.69      0.69      0.66       226

The accuracy is about 69%. One thing that is noticeable low is the  recall for people who do not live in the city. This probably one reason why the overall roc auc score is so low. The f1-score is also low for those who do not live in the city as well. The f1-score is just a combination of precision and recall. If we really want to improve performance we would probably start with improving the recall of the no’s.


This post provided an example of how you can  combine different algorithms to make predictions in Python. This is a powerful technique t to use. Off course, it is offset by the complexity of the analysis which makes it hard to explain exactly what the results mean if you were asked tot do so.

To leave a comment for the author, please follow the link and comment on their blog: python – educational research techniques .

Want to share your content on python-bloggers? click here.