Hyperparameter Search and Classifier Threshold Selection

This article was first published on The Pleasure of Finding Things Out: A blog by James Triveri , and kindly contributed to python-bloggers. (You can report issue about the content on this page here)
Want to share your content on python-bloggers? click here.

The following notebook demonstrates how to use GridSearchCV to identify optimal hyperparameters for a given model and metric, and alternatives for selecting a classifier threshold in scikit-learn.

First we load the breast cancer dataset. We will forgo any pre-processing, but create separate train and validation sets:

import matplotlib.pyplot as plt
import numpy as np
import pandas as pd

from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

np.set_printoptions(suppress=True, precision=8, linewidth=1000)
pd.options.mode.chained_assignment = None
pd.set_option('display.max_columns', None)
pd.set_option('display.width', None)

data = load_breast_cancer()
X = data["data"]
y = data["target"]


# Create train, validation and test splits. 
Xtrain, Xvalid, ytrain, yvalid = train_test_split(X, y, test_size=.20, random_state=516)

print(f"Xtrain.shape: {Xtrain.shape}")
print(f"Xvalid.shape: {Xvalid.shape}")
Xtrain.shape: (455, 30)
Xvalid.shape: (114, 30)
To leave a comment for the author, please follow the link and comment on their blog: The Pleasure of Finding Things Out: A blog by James Triveri .

Want to share your content on python-bloggers? click here.