# Bayesian Optimization with GPopt

**T. Moudiki's Webpage - Python**, and kindly contributed to python-bloggers]. (You can report issue about the content on this page here)

Want to share your content on python-bloggers? click here.

Due to the way it mixes several – relatively – simple concepts, Bayesian optimization (BO) is one of the **most elegant** mathematical tool I’ve encountered so far. In this post, I introduce `GPopt`

, a tool for BO that I implemented in Python (no technical docs yet, but coming soon). The examples of `GPopt`

’s use showcased here are based on Gaussian Processes (GP) and Expected Improvement (EI): **what does that mean?**

GPs are Bayesian statistical/machine learning (ML) models which create a distribution on functions, and especially on black-box functions. If we let \(f\) be the black-box – and expensive-to-evaluate – function whose minimum is searched, a GP is firstly adjusted (in a supervised learning way) to a small set of points at which \(f\) is evaluated. This small set of points is the *initial design*. Then the GP, thanks to its probabilistic nature, will **exploit** its knowledge of previous points at which \(f\) has already been evaluated, to **explore** new points: potential minimas maximizing an EI criterion. It’s a reinforcement learning question!

For more details on Bayesian optimization applied to hyperparameters calibration in ML, you can read Chapter 6 of this document. In this post, a Branin (2D) and a Hartmann (3D) functions will be used as examples of objective functions \(f\), and Matérn 5/2 is the GP’s covariance.

**Installing and importing the packages**:

!pip install GPopt

!pip install matplotlib==3.1.3

import GPopt as gp import time import sys import numpy as np import matplotlib.pyplot as plt from os import chdir from scipy.optimize import minimize

The objective function to be minimized:

# branin function def branin(x): x1 = x[0] x2 = x[1] term1 = (x2 - (5.1*x1**2)/(4*np.pi**2) + (5*x1)/np.pi - 6)**2 term2 = 10*(1-1/(8*np.pi))*np.cos(x1) return (term1 + term2 + 10)

# A local minimum for verification print("\n") res = minimize(branin, x0=[0, 0], method='Nelder-Mead', tol=1e-6) print(res.x) print(branin(res.x))

[3.14159271 2.27500017] 0.397887357729795

Firstly, I **demonstrate the convergence of the optimizer** towards a minimum (and show that the optimizer can pick up the procedure where it left):

# n_init is the number of points in the initial design # n_iter is the number of iterations of the optimizer gp_opt = gp.GPOpt(objective_func=branin, lower_bound = np.array([-5, 0]), upper_bound = np.array([10, 15]), n_init=10, n_iter=10) gp_opt.optimize(verbose=1)

Plotting the **changes in expected improvement as a function of the number of iterations**.

print(gp_opt.x_min) # current minimum print(gp_opt.y_min) # current minimum plt.plot(np.diff(gp_opt.max_ei))

[9.31152344 1.68457031] 0.9445903336427559

Adding **more iterations** to the optimizer:

gp_opt.optimize(verbose=1, n_more_iter=10)

...Done. Optimization loop... 10/10 [██████████████████████████████] - 2s 186ms/step (array([3.22692871, 2.63122559]), 0.6107733232129569)

Plotting the **changes in expected improvement as a function of the number of iterations** (again).

print(gp_opt.x_min) # current minimum print(gp_opt.y_min) # current minimum plt.plot(np.diff(gp_opt.max_ei))

[3.22692871 2.63122559] 0.6107733232129569

Adding more iterations to the optimizer (again):

gp_opt.optimize(verbose=1, n_more_iter=80)

Plotting the **changes in expected improvement as a function of the number of iterations** (again).

print(gp_opt.x_min) # current minimum print(gp_opt.y_min) # current minimum plt.plot(np.diff(gp_opt.max_ei))

[9.44061279 2.48199463] 0.3991320518189241

The 3 previous graphs suggest the possibility of stopping the optimizer *earlier*, when the **algorithm is not improving on previous points’ results anymore**:

# # early stopping # abs_tol is the parameter that controls early stopping gp_opt2 = gp.GPOpt(objective_func=branin, lower_bound = np.array([-5, 0]), upper_bound = np.array([10, 15]), n_init=10, n_iter=190) gp_opt2.optimize(verbose=2, abs_tol=1e-4) print("\n")

We can observe that only 58 iterations are necessary when `abs_tol = 1e-4`

print(gp_opt2.n_iter) print(gp_opt2.x_min) print(gp_opt2.y_min)

58 [9.44061279 2.48199463] 0.3991320518189241

Illustrating convergence:

plt.plot(gp_opt2.max_ei)

We notice that in this example, GPopt falls into a local minimum but is very close to the previous minimum found with gradient-free optimizer (Nelder-Mead). The **opposite situation can occur too**:

# [0, 1]^3 def hartmann3(xx): alpha = np.array([1.0, 1.2, 3.0, 3.2]) A = np.array([3.0, 10, 30, 0.1, 10, 35, 3.0, 10, 30, 0.1, 10, 35]).reshape(4, 3) P = 1e-4 * np.array([3689, 1170, 2673, 4699, 4387, 7470, 1091, 8732, 5547, 381, 5743, 8828]).reshape(4, 3) xxmat = np.tile(xx,4).reshape(4, 3) inner = np.sum(A*(xxmat-P)**2, axis = 1) outer = np.sum(alpha * np.exp(-inner)) return(-outer)

# Fails, but may work with multiple restarts print("\n") res = minimize(hartmann3, x0=[0, 0, 0], method='Nelder-Mead', tol=1e-6) print(res.x) print(hartmann3(res.x))

[0.36872308 0.11756145 0.26757363] -1.00081686355956

# hartmann 3D gp_opt4 = gp.GPOpt(objective_func=hartmann3, lower_bound = np.repeat(0, 3), upper_bound = np.repeat(1, 3), n_init=20, n_iter=280) gp_opt4.optimize(verbose=2, abs_tol=1e-4)

print(gp_opt4.n_iter) print(gp_opt4.x_min) print(gp_opt4.y_min) print("\n")

51 [0.07220459 0.55792236 0.85662842] -3.8600590626769904

The question is, **in the case of BO applied to ML cross-validation, do we really want to find the global minimum of the objective** function?

**leave a comment**for the author, please follow the link and comment on their blog:

**T. Moudiki's Webpage - Python**.

Want to share your content on python-bloggers? click here.