Want to share your content on python-bloggers? click here.
Markov Chain Monte Carlo (MCMC) methods can be used to estimate the parameters of linear regression models. In this post, the distributional forms and algorithms associated with two Gibbs sampling approaches are outlined: the full conditionals approach and the composition method.
The dataset used in what follows is available here and represents refrigerator price as a function of a number of features. The objective is to estimate PRICE as a function of ECOST, FSIZE, SHELVES and FEATURES, where:
- PRICE: Response, cost of refrigerator.
- FSIZE: Size of freezer compartment.
- ECOST: Annual energy cost to operate refrigerator.
- SHELVES: Number of shelves.
- FEATURES: Number of features.
I. The Full Conditionals Method
First determine the joint kernel, which is the product of the likelihood and all prior distributions. The joint kernel will be proportional to the posterior distribution:
The likelihood,
For the parameter vector
Since the prior distribution for the parameter vector
Note that posterior distribution is asymptotically equivalent to the expression for the model likelihood.
Gibbs sampling requires identifying the full conditional distribution for each parameter, holding all other parameters constant. To find the full conditional distribution for
After distributing the transpose, removing multiplicative constants and performing a bit of reorganization, we arrive at:
Upon completing the square, we find the distribution for
For
where
Full Conditionals Approach for Sampling from the Posterior Distribution:
-
Establish starting values for
and . We use and -
Sample
from multivariate normal distribution with fixed. -
Sample
from inverse gamma distribution with fixed.
In the code below, we run the sampler for 10,000 iterations, discarding the first 1,000 samples.
""" The full conditionals Gibbs sampling method. PRICE ~ "ECOST" + "FSIZE" + "SHELVES" + "FEATURES" """ from math import sqrt import matplotlib as mpl import matplotlib.pyplot as plt import numpy as np import pandas as pd from scipy import stats from scipy.linalg import cholesky pd.options.mode.chained_assignment = None pd.set_option('display.max_columns', 1000) pd.set_option('display.width', 10000) np.set_printoptions(suppress=True) data_url = "https://gist.github.com/jtrive84/96757393423b2599c7d5da361fdf024b/raw/82000835b7c3b70dcf21a9ab43c59db700b78136/Refrigerator.csv" df = pd.read_csv(data_url) variates = ["ECOST", "FSIZE", "SHELVES", "FEATURES"] X = df[variates].values y = df["PRICE"].values.reshape(-1, 1) X = np.concatenate([np.ones(y.size).reshape(-1, 1), X], axis=1) n, p = X.shape M = 10000 burnin = 1000 prng = np.random.RandomState(516) # Initialize arrays to hold posterior samples. betas, sigma2 = np.zeros([M, p]), np.ones(M) # Initialize parameter arrays and compute covariance matrix. b_ols = np.linalg.inv(X.T @ X) @ X.T @ y V = np.linalg.inv(X.T @ X) sigma2[0] = ((y - X @ b_ols).T @ (y - X @ b_ols) / (n - p)).item() betas[0, :] = b_ols.T # Gibbs sampling from full conditionals. At each iteration, p independent # standard normal random variates are sampled, which are transformed into # a draw from a multivariate normal density with mean betas[i, :] and # covariance V. for ii in range(1, M): # Sample from full conditional distribution for betas. betas[ii,:] = b_ols.T + prng.randn(p).reshape(1, -1) @ cholesky(sigma2[ii - 1] * V) # Sample from full conditional distribution for variance. sigma2[ii] = stats.invgamma.rvs( a=.50 * n, scale=.50 * ((y - X @ betas[ii,:].reshape(-1, 1)).T @ (y - X @ betas[ii,:].reshape(-1, 1))).item(), random_state=prng ) # Remove burnin samples. betas = betas[burnin:,:] sigma2 = sigma2[burnin:]
Traceplots can be produced to visualize sampling variability across iterations:
# Combine beta and sigma2 arrays to create traceplots. varnames = ["intercept"] + variates + ["sigma2"] Xall1 = np.hstack([betas, sigma2.reshape(-1, 1)]) fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(10, 5), tight_layout=True) indices = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)] for indx, (varname, (ii, jj)) in enumerate(zip(varnames, indices)): ax[ii, jj].set_title(varname, color="#000000", loc="center", fontsize=8) ax[ii, jj].plot(Xall1[:,indx], color="#000000", linewidth=.25, linestyle="-") ax[ii, jj].xaxis.set_major_formatter(mpl.ticker.StrMethodFormatter("{x:,.0f}")) ax[ii, jj].set_xlabel("") ax[ii, jj].set_ylabel("") ax[ii, jj].tick_params(axis="x", which="major", direction='in', labelsize=7) ax[ii, jj].tick_params(axis="y", which="major", direction='in', labelsize=7) ax[ii, jj].xaxis.set_ticks_position("none") ax[ii, jj].yaxis.set_ticks_position("none") ax[ii, jj].grid(True) ax[ii, jj].set_axisbelow(True) plt.suptitle("Traceplots: Full Conditionals Approach", fontsize=10) plt.show();
Histograms for each parameter can be generated from the posterior samples:
# Create histograms from posterior samples with overlaid ols estimates. hist_color = "#ff7595" fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(10, 6), tight_layout=True) plt_ols = np.append(b_ols.ravel(), sigma2[0]) indices = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)] for indx, (varname, (ii, jj)) in enumerate(zip(varnames, indices)): ax[ii, jj].set_title(varname, color="#000000", loc="center", fontsize=8) ax[ii, jj].hist( Xall1[:,indx], 25, density=True, alpha=1, color=hist_color, edgecolor="#FFFFFF", linewidth=1.0 ) label0 = r"$\hat \sigma_{0}^{2}$" if indx == 5 else r"$\hat \beta_{OLS}$" label1 = r"$\hat \sigma_{MCMC}^{2}$" if indx == 5 else r"$\hat \beta_{MCMC}$" ax[ii,jj].axvline(plt_ols[indx], color="grey", linewidth=1.25, linestyle="--", label=label0) ax[ii,jj].axvline(Xall1[:,indx].mean(), color="blue", linewidth=1.25, linestyle="--", label=label1) ax[ii, jj].xaxis.set_major_formatter(mpl.ticker.StrMethodFormatter("{x:,.0f}")) ax[ii,jj].set_yticklabels([]) ax[ii, jj].set_xlabel("") ax[ii, jj].set_ylabel("") ax[ii, jj].tick_params(axis="x", which="major", direction='in', labelsize=6) ax[ii, jj].tick_params(axis="y", which="major", direction='in', labelsize=6) ax[ii, jj].xaxis.set_ticks_position("none") ax[ii, jj].yaxis.set_ticks_position("none") ax[ii, jj].grid(True) ax[ii, jj].set_axisbelow(True) ax[ii,jj].legend(loc="upper right", fancybox=True, framealpha=1, fontsize="x-small") plt.suptitle("Histograms: Full Conditionals Approach", fontsize=10) plt.show();
II. Composition Method
The composition method can also be used to generate posterior samples for
As before, the conditional distribution for
The composition method is faster than the full conditionals method since we can draw all samples from
Composition Method Approach for Sampling from the Posterior Distribution:
-
Compute
via OLS and . -
Compute
. -
Draw
samples from . -
For
: Draw from : .
""" Drawing posterior samples using the composition method. PRICE ~ "ECOST" + "FSIZE" + "SHELVES" + "FEATURES" """ n, p = X.shape M = 10000 burnin = 1000 prng = np.random.RandomState(516) # Initialize betas to hold posterior samples. betas = np.zeros([M, p]) # Compute OLS estimate, V and SSR. b_ols = np.linalg.inv(X.T @ X) @ X.T @ y V = np.linalg.inv(X.T @ X) SSR = (y - X @ b_ols).T @ (y - X @ b_ols) / (n - p) # Draw M samples from marginal distribution for sigma2. sigma2 = stats.invgamma.rvs(a=.50 * (n - p), scale=.50 * (n - p) * SSR, size=M, random_state=prng) # Simulate sequence of beta draws for each value in sigma2. for ii in range(M): betas[ii,:] = b_ols.T + prng.randn(p).reshape(1, -1) @ cholesky(sigma2[ii] * V)
As before, we produce traceplots for each variable and
varnames = ["intercept"] + variates + ["sigma2"] Xall2 = np.hstack([betas, sigma2.reshape(-1, 1)]) fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(10, 5), tight_layout=True) indices = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)] for indx, (varname, (ii, jj)) in enumerate(zip(varnames, indices)): ax[ii, jj].set_title(varname, color="#000000", loc="center", fontsize=8) ax[ii, jj].plot(Xall2[:,indx], color="#000000", linewidth=.25, linestyle="-") ax[ii, jj].xaxis.set_major_formatter(mpl.ticker.StrMethodFormatter("{x:,.0f}")) ax[ii, jj].set_xlabel("") ax[ii, jj].set_ylabel("") ax[ii, jj].tick_params(axis="x", which="major", direction='in', labelsize=7) ax[ii, jj].tick_params(axis="x", which="minor", direction='in', labelsize=7) ax[ii, jj].tick_params(axis="y", which="major", direction='in', labelsize=7) ax[ii, jj].tick_params(axis="y", which="minor", direction='in', labelsize=7) ax[ii, jj].xaxis.set_ticks_position("none") ax[ii, jj].yaxis.set_ticks_position("none") ax[ii, jj].grid(True) ax[ii, jj].set_axisbelow(True) plt.suptitle("Traceplots: Composition Method", fontsize=10) plt.show();
As well as histograms:
hist_color = "#ff7595" fig, ax = plt.subplots(nrows=2, ncols=3, figsize=(10, 6), tight_layout=True) plt_ols = np.append(b_ols.ravel(), sigma2[0]) indices = [(0, 0), (0, 1), (0, 2), (1, 0), (1, 1), (1, 2)] for indx, (varname, (ii, jj)) in enumerate(zip(varnames, indices)): ax[ii, jj].set_title(varname, color="#000000", loc="center", fontsize=8) ax[ii, jj].hist( Xall2[:,indx], 25, density=True, alpha=1, color=hist_color, edgecolor="#FFFFFF", linewidth=1.0 ) label0 = r"$\hat \sigma_{0}^{2}$" if indx == 5 else r"$\hat \beta_{OLS}$" label1 = r"$\hat \sigma_{MCMC}^{2}$" if indx == 5 else r"$\hat \beta_{MCMC}$" ax[ii,jj].axvline(plt_ols[indx], color="grey", linewidth=1.25, linestyle="--", label=label0) ax[ii,jj].axvline(Xall2[:,indx].mean(), color="blue", linewidth=1.25, linestyle="--", label=label1) ax[ii, jj].xaxis.set_major_formatter(mpl.ticker.StrMethodFormatter("{x:,.0f}")) ax[ii,jj].set_yticklabels([]) ax[ii, jj].set_xlabel("") ax[ii, jj].set_ylabel("") ax[ii, jj].tick_params(axis="x", which="major", direction='in', labelsize=6) ax[ii, jj].tick_params(axis="y", which="major", direction='in', labelsize=6) ax[ii, jj].xaxis.set_ticks_position("none") ax[ii, jj].yaxis.set_ticks_position("none") ax[ii, jj].grid(True) ax[ii, jj].set_axisbelow(True) ax[ii,jj].legend(loc="upper right", fancybox=True, framealpha=1, fontsize="x-small") plt.suptitle("Histograms: Composition Method", fontsize=10) plt.show();
Posterior percentiles from the full conditionals approach and composition method can be compared:
""" Comparing posterior percentiles for full conditionals and composition approaches. """ peval = [1, 5, 25, 50, 75, 95, 99] pcols = [f"{ii:.0f}%" for ii in peval] for indx, varname in enumerate(varnames): # Determine percentiles generated via full-conditionals. df1 = pd.DataFrame( np.round(np.percentile(Xall1[:, indx], q=peval), 2).reshape(1,-1), columns=pcols,index=[f"{varname}: conditionals"] ) # Determine percentiles generated via composition method. df2 = pd.DataFrame( np.round(np.percentile(Xall2[:, indx], q=peval), 2).reshape(1,-1), columns=pcols, index=[f"{varname}: composition"] ) df = pd.concat([df1, df2]) display(df)
1% | 5% | 25% | 50% | 75% | 95% | 99% | |
---|---|---|---|---|---|---|---|
intercept: conditionals | -25.01 | 51.28 | 150.96 | 221.28 | 289.90 | 386.74 | 464.17 |
intercept: composition | -21.44 | 50.76 | 152.43 | 221.71 | 288.77 | 387.09 | 461.41 |
1% | 5% | 25% | 50% | 75% | 95% | 99% | |
---|---|---|---|---|---|---|---|
ECOST: conditionals | -10.34 | -8.41 | -5.77 | -4.0 | -2.20 | 0.39 | 2.62 |
ECOST: composition | -10.20 | -8.34 | -5.76 | -4.0 | -2.21 | 0.38 | 2.28 |
1% | 5% | 25% | 50% | 75% | 95% | 99% | |
---|---|---|---|---|---|---|---|
FSIZE: conditionals | 38.83 | 58.12 | 85.95 | 103.97 | 121.74 | 148.78 | 169.22 |
FSIZE: composition | 39.05 | 57.93 | 85.41 | 103.99 | 122.53 | 148.96 | 168.51 |
1% | 5% | 25% | 50% | 75% | 95% | 99% | |
---|---|---|---|---|---|---|---|
SHELVES: conditionals | -2.56 | 5.70 | 17.33 | 25.19 | 32.86 | 44.09 | 52.25 |
SHELVES: composition | -2.71 | 5.68 | 17.42 | 24.81 | 32.72 | 44.32 | 52.66 |
1% | 5% | 25% | 50% | 75% | 95% | 99% | |
---|---|---|---|---|---|---|---|
FEATURES: conditionals | 11.87 | 15.60 | 21.25 | 24.87 | 28.68 | 34.11 | 38.15 |
FEATURES: composition | 11.79 | 15.68 | 21.13 | 24.86 | 28.46 | 33.87 | 37.89 |
1% | 5% | 25% | 50% | 75% | 95% | 99% | |
---|---|---|---|---|---|---|---|
sigma2: conditionals | 3209.36 | 3706.92 | 4620.93 | 5465.36 | 6527.58 | 8570.83 | 10609.79 |
sigma2: composition | 3194.00 | 3691.71 | 4649.74 | 5467.54 | 6490.48 | 8482.73 | 10295.27 |
Posterior Predictive Samples
For a new set of predictors
- For
: Draw from posterior samples. - Draw
.
The code that follows generates 5000 posterior predictive samples for a new set of predictors.
# Generating posterior predictive samples for a new set of predictors. n = 5000 Xnew = [1, 82, 5.1, 2, 8] # Determine which 5000 parameter sets to use for sampling. indices = prng.choice(range(Xall2.shape[0]), size=n, replace=False) params = Xall2[indices,:] samples = [] for ii in range(n): beta, v = params[ii, :-1], params[ii, -1] mu = np.dot(beta, Xnew) y_ii = prng.normal(loc=mu, scale=sqrt(v)) samples.append(y_ii) samples = np.asarray(samples)
Viewing the histogram of posterior predictive samples:
hist_color = "#ff7595" fig, ax = plt.subplots(nrows=1, ncols=1, figsize=(8, 5), tight_layout=True) ax.set_title( "Posterior predictive samples for new predictors", color="#000000", loc="center", fontsize=10 ) ax.hist( samples, 30, density=True, alpha=1, color=hist_color, edgecolor="#FFFFFF", linewidth=1.0 ) label0, label1 = r"$\hat{y}_{OLS}$", r"$\hat y_{MCMC}$" ax.axvline(np.dot(b_ols.T, Xnew), color="grey", linewidth=1.25, linestyle="--", label=label0) ax.axvline(samples.mean(), color="blue", linewidth=1.25, linestyle="--", label=label1) ax.xaxis.set_major_formatter(mpl.ticker.StrMethodFormatter("{x:,.0f}")) ax.set_yticklabels([]) ax.set_xlabel("") ax.set_ylabel("") ax.tick_params(axis="x", which="major", direction='in', labelsize=7) ax.tick_params(axis="y", which="major", direction='in', labelsize=7) ax.xaxis.set_ticks_position("none") ax.yaxis.set_ticks_position("none") ax.grid(True) ax.set_axisbelow(True) ax.legend(loc="upper right", fancybox=True, framealpha=1, fontsize="small") plt.show();
Finally, percentiles of the posterior predictive samples:
peval = [1, 5, 25, 50, 75, 95, 99] pcols = [f"{ii:.0f}%" for ii in peval] df = pd.DataFrame( np.round(np.percentile(samples, q=peval), 2).reshape(1,-1), columns=pcols,index=["posterior predictive samples"] ) df.head(1)
1% | 5% | 25% | 50% | 75% | 95% | 99% | |
---|---|---|---|---|---|---|---|
posterior predictive samples | 464.64 | 530.08 | 613.37 | 673.32 | 731.45 | 813.15 | 883.41 |
Want to share your content on python-bloggers? click here.