Classification using linear regression

[This article was first published on T. Moudiki's Webpage - Python, and kindly contributed to python-bloggers]. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

In this post, I illustrate classification using linear regression, as implemented in Python/R package nnetsauce, and more precisely, in nnetsauce’s MultitaskClassifier. If you’re not interested in reading about the model description, you can jump directly to the 2nd section, “Two examples in Python”. In addition, the source code is relatively self-explanatory.

Model description

Chapter 4 of Elements of Statistical Learning (ESL), at section 4.2 Linear Regression of an Indicator Matrix, describes classification using linear regression pretty well. Let \(K \in \mathbb{N}\) be the number of classes and \(y \in \mathbb{N}^n\) with values in \(\lbrace 1, \ldots, K \rbrace\) be the variable to be explained. An indicator response matrix \(\textbf{Y} \in \mathbb{N}^{n \times K }\), containing only 0’s and 1’s, can be obtained from \(y\). Each row of \(\textbf{Y}\) shall contain a single 1 – in the column corresponding to the class where the example belongs, and 0’s elsewhere.

Now, let \(\textbf{X} \in \mathbb{R}^{n \times p }\) be the set of explanatory variables for \(y\) and \(\textbf{Y}\), with examples in rows, and characteristics in columns. ESL applies \(K\) least squares models to \(\textbf{X}\), for each column of \(\textbf{Y}\). The regression’s predicted values can be interpreted as raw estimates of probabilities, because the least squares’ solution is a conditional expectation. And for \(G\), a random variable describing the class, we have:

\[\mathbb{E} \left[ \mathbb{1}_{ G = k } \vert X = x \right] = \mathbb{P} \left[ G = k \vert X = x \right]\]

The difference between nnetsauce’s MultitaskClassifier and the model described in ESL is:

  • Any model possessing methods fit and predict can be used in lieu of a linear regression of \(\textbf{Y}\) on \(\textbf{X}\)
  • the set of covariates include the original covariates, \(\textbf{X}\), plus nonlinear transformations of \(\textbf{X}\), \(h(\textbf{X})\), as done in Quasi-Randomized Networks. Having \(h(\textbf{X})\) as additional explanatory variables enhances the models’ flexibility; the model is no longer linear.

  • If for each \(k \in \lbrace 1, \ldots, K \rbrace\), \(\hat{f}_k(x)\) is the regression’s predicted value for class \(k\) and an observation characterized by \(x\), nnetsauce’s MultitaskClassifier obtains probabilities that an observation characterized by \(x\) belongs to class \(k\) as:

\[\hat{p}_k(x) := \frac{expit \left( \hat{f}_k(x) \right)}{\sum_{i=1}^K expit \left( \hat{f}_k(x) \right)}\]

Where we have \(expit := \frac{1}{1 + exp(-x)}\). \(x \mapsto expit(x)\) is strictly increasing, hence it preserves the ordering of linear regression’s predictions. \(x \mapsto expit(x)\) is also bounded in \([0, 1]\), which helps in avoiding overflows. I divide \(expit \left( \hat{f}_k(x) \right)\) by \(\sum_{i=1}^K expit \left( \hat{f}_k(x) \right)\), so that the probabilities add up to 1. And to finish, the class predicted for an example characterized by \(x\) is:

\[argmax_{k \in \lbrace 1, \ldots, K \rbrace} \hat{p}_k(x)\]

Two examples in Python

Currently, installing nnetsauce from Pypi doesn’t work – and I’m working on fixing it. However, you can install nnetsauce from GitHub as follows:

pip install git+https://github.com/Techtonique/nnetsauce.git

Import the packages required for the 2 examples.

import nnetsauce as ns
import numpy as np
from sklearn.datasets import  load_wine, load_iris
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn import metrics
from time import time

1. Classification of iris dataset:

dataset = load_iris()
Z = dataset.data
t = dataset.target

# training set (80%) and test set (20%)
X_train, X_test, y_train, y_test = train_test_split(Z, t, test_size=0.2, 
                                                    random_state=143)

# Linear Regression is used here
regr3 = LinearRegression()

# `n_hidden_features` makes the model nonlinear
# `n_clusters` takes into account heterogeneity
fit_obj3 = ns.MultitaskClassifier(regr3, n_hidden_features=5, 
                                  n_clusters=2, type_clust="gmm")

# Adjust the model
start = time()
fit_obj3.fit(X_train, y_train)
print(f"Elapsed {time() - start}") 

# Classification report
start = time()
preds = fit_obj3.predict(X_test)
print(f"Elapsed {time() - start}") 
print(metrics.classification_report(preds, y_test))
Elapsed 0.021012067794799805
Elapsed 0.0010943412780761719
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        12
           1       1.00      1.00      1.00         5
           2       1.00      1.00      1.00        13

    accuracy                           1.00        30
   macro avg       1.00      1.00      1.00        30
weighted avg       1.00      1.00      1.00        30

2. Classification of wine dataset:

dataset = load_wine()
Z = dataset.data
t = dataset.target

# training set (80%) and test set (20%)
X_train, X_test, y_train, y_test = train_test_split(Z, t, test_size=0.2, 
                                                    random_state=143)

# Linear Regression is used here 
regr4 = LinearRegression()

# `n_hidden_features` makes the model nonlinear
# `n_clusters` takes into account heterogeneity
fit_obj4 = ns.MultitaskClassifier(regr4, n_hidden_features=5, 
                                  n_clusters=2, type_clust="gmm")

# Adjust the model
start = time()
fit_obj4.fit(X_train, y_train)
print(f"Elapsed {time() - start}") 

# Classification report
start = time()
preds = fit_obj4.predict(X_test)
print(f"Elapsed {time() - start}") 
print(metrics.classification_report(preds, y_test))
Elapsed 0.019229650497436523
Elapsed 0.001451253890991211
              precision    recall  f1-score   support

           0       1.00      1.00      1.00        16
           1       1.00      1.00      1.00        11
           2       1.00      1.00      1.00         9

    accuracy                           1.00        36
   macro avg       1.00      1.00      1.00        36
weighted avg       1.00      1.00      1.00        36
To leave a comment for the author, please follow the link and comment on their blog: T. Moudiki's Webpage - Python.

Want to share your content on python-bloggers? click here.