Classification using linear regression
Want to share your content on python-bloggers? click here.
In this post, I illustrate classification using linear regression, as implemented in Python/R package nnetsauce
, and more precisely, in nnetsauce
’s MultitaskClassifier
. If you’re not interested in reading about the model description, you can jump directly to the 2nd section, “Two examples in Python”. In addition, the source code is relatively self-explanatory.
Model description
Chapter 4 of Elements of Statistical Learning (ESL), at section 4.2 Linear Regression of an Indicator Matrix, describes classification using linear regression pretty well. Let \(K \in \mathbb{N}\) be the number of classes and \(y \in \mathbb{N}^n\) with values in \(\lbrace 1, \ldots, K \rbrace\) be the variable to be explained. An indicator response matrix \(\textbf{Y} \in \mathbb{N}^{n \times K }\), containing only 0’s and 1’s, can be obtained from \(y\). Each row of \(\textbf{Y}\) shall contain a single 1 – in the column corresponding to the class where the example belongs, and 0’s elsewhere.
Now, let \(\textbf{X} \in \mathbb{R}^{n \times p }\) be the set of explanatory variables for \(y\) and \(\textbf{Y}\), with examples in rows, and characteristics in columns. ESL applies \(K\) least squares models to \(\textbf{X}\), for each column of \(\textbf{Y}\). The regression’s predicted values can be interpreted as raw estimates of probabilities, because the least squares’ solution is a conditional expectation. And for \(G\), a random variable describing the class, we have:
\[\mathbb{E} \left[ \mathbb{1}_{ G = k } \vert X = x \right] = \mathbb{P} \left[ G = k \vert X = x \right]\]
The difference between nnetsauce
’s MultitaskClassifier
and the model described in ESL is:
- Any model possessing methods
fit
andpredict
can be used in lieu of a linear regression of \(\textbf{Y}\) on \(\textbf{X}\) the set of covariates include the original covariates, \(\textbf{X}\), plus nonlinear transformations of \(\textbf{X}\), \(h(\textbf{X})\), as done in Quasi-Randomized Networks. Having \(h(\textbf{X})\) as additional explanatory variables enhances the models’ flexibility; the model is no longer linear.
- If for each \(k \in \lbrace 1, \ldots, K \rbrace\), \(\hat{f}_k(x)\) is the regression’s predicted value for class \(k\) and an observation characterized by \(x\),
nnetsauce
’sMultitaskClassifier
obtains probabilities that an observation characterized by \(x\) belongs to class \(k\) as:
\[\hat{p}_k(x) := \frac{expit \left( \hat{f}_k(x) \right)}{\sum_{i=1}^K expit \left( \hat{f}_k(x) \right)}\]
Where we have \(expit := \frac{1}{1 + exp(-x)}\). \(x \mapsto expit(x)\) is strictly increasing, hence it preserves the ordering of linear regression’s predictions. \(x \mapsto expit(x)\) is also bounded in \([0, 1]\), which helps in avoiding overflows. I divide \(expit \left( \hat{f}_k(x) \right)\) by \(\sum_{i=1}^K expit \left( \hat{f}_k(x) \right)\), so that the probabilities add up to 1. And to finish, the class predicted for an example characterized by \(x\) is:
\[argmax_{k \in \lbrace 1, \ldots, K \rbrace} \hat{p}_k(x)\]
Two examples in Python
Currently, installing nnetsauce
from Pypi doesn’t work – and I’m working on fixing it. However, you can install nnetsauce
from GitHub as follows:
pip install git+https://github.com/Techtonique/nnetsauce.git
Import the packages required for the 2 examples.
import nnetsauce as ns import numpy as np from sklearn.datasets import load_wine, load_iris from sklearn.linear_model import LinearRegression from sklearn.model_selection import train_test_split from sklearn import metrics from time import time
1. Classification of iris dataset:
dataset = load_iris() Z = dataset.data t = dataset.target # training set (80%) and test set (20%) X_train, X_test, y_train, y_test = train_test_split(Z, t, test_size=0.2, random_state=143) # Linear Regression is used here regr3 = LinearRegression() # `n_hidden_features` makes the model nonlinear # `n_clusters` takes into account heterogeneity fit_obj3 = ns.MultitaskClassifier(regr3, n_hidden_features=5, n_clusters=2, type_clust="gmm") # Adjust the model start = time() fit_obj3.fit(X_train, y_train) print(f"Elapsed {time() - start}") # Classification report start = time() preds = fit_obj3.predict(X_test) print(f"Elapsed {time() - start}") print(metrics.classification_report(preds, y_test))
Elapsed 0.021012067794799805 Elapsed 0.0010943412780761719 precision recall f1-score support 0 1.00 1.00 1.00 12 1 1.00 1.00 1.00 5 2 1.00 1.00 1.00 13 accuracy 1.00 30 macro avg 1.00 1.00 1.00 30 weighted avg 1.00 1.00 1.00 30
2. Classification of wine dataset:
dataset = load_wine() Z = dataset.data t = dataset.target # training set (80%) and test set (20%) X_train, X_test, y_train, y_test = train_test_split(Z, t, test_size=0.2, random_state=143) # Linear Regression is used here regr4 = LinearRegression() # `n_hidden_features` makes the model nonlinear # `n_clusters` takes into account heterogeneity fit_obj4 = ns.MultitaskClassifier(regr4, n_hidden_features=5, n_clusters=2, type_clust="gmm") # Adjust the model start = time() fit_obj4.fit(X_train, y_train) print(f"Elapsed {time() - start}") # Classification report start = time() preds = fit_obj4.predict(X_test) print(f"Elapsed {time() - start}") print(metrics.classification_report(preds, y_test))
Elapsed 0.019229650497436523 Elapsed 0.001451253890991211 precision recall f1-score support 0 1.00 1.00 1.00 16 1 1.00 1.00 1.00 11 2 1.00 1.00 1.00 9 accuracy 1.00 36 macro avg 1.00 1.00 1.00 36 weighted avg 1.00 1.00 1.00 36
Want to share your content on python-bloggers? click here.