MCMC with GPs with Normal likelihoods
[2]:
import numpy as np
import pandas as pd
import scipy.optimize
import scipy.stats as st
import cmdstanpy
import arviz as az
import bebi103
import bokeh.io
bokeh.io.output_notebook()
In the previous lesson, we found MAP estimates for the hyperparameters and \(\sigma\) parameters of a model with a GP prior and a Normal likelihood. Here, we estimate those parameter with Markov chain Monte Carlo.
As a reminder, the general model we are considering is
\begin{align} &\theta_k \sim \text{some hyperprior}\\[1em] &\boldsymbol{\sigma} \sim \text{some prior}\\[1em] &\mathbf{y} \mid \mathbf{X}, \boldsymbol{\sigma}, \theta_k \sim \mathrm{MultiNorm}(\mathbf{0}, \mathsf{K}_\mathbf{y}), \end{align}
where we have marginalized out the latent variables \(f(\mathbf{X})\). (See the previous lesson for the definition of \(\mathsf{K}_\mathbf{y}\).)
We are specifically modeling the data set from Wolfenden and Snider, which can be downloaded here. The model we will consider here uses a SE kernel.
\begin{align} &\alpha \sim \text{HalfNorm}(2)\\[1em] &\rho \sim \text{InvGamma}(0.5, 2)\\[1em] &\sigma \sim \text{HalfNorm}(0.1)\\[1em] &\mathbf{y} \mid \mathbf{X}, \sigma, \alpha, \rho \sim \mathrm{MultiNorm}(\mathbf{0}, \mathsf{K}_\mathbf{y}), \end{align}
where \(\mathbf{X} = \mathbf{T}\), a set of temperatures and \(\mathbf{y} = \mathbf{k}\), a set of chemical rate constants.
Let’s load in the data set and scale and center it before we proceed to estimating the parameters of the model.
[3]:
# Load data
df = pd.read_csv(os.path.join(data_path, 'wolfenden_arrhenius.csv'))
df['T (K)'] = 1000 / df['1000/T (1/K)']
df['k (1/s)'] = np.exp(df['ln k (1/s)'])
T = df["T (K)"].values
k = df["k (1/s)"].values
# Center and scale
k_scaled = (k - k.mean()) / k.std()
T_scaled = (T - T.mean()) / T.std()
# Sample at 250 points
Nstar = 250
# Set up xstar
T_range = T_scaled.max() - T_scaled.min()
xstar = np.linspace(
T_scaled.min() - 0.05 * T_range, T_scaled.max() + 0.05 * T_range, Nstar
)
Hyperparameter estimation using MCMC
Estimation of the hyperparameters is as simple as coding up the model in Stan. Below is the Stan code.
data {
int<lower=1> N;
array[N] real x;
vector[N] y;
}
parameters {
real<lower=0> alpha;
real<lower=0> rho;
real<lower=0> sigma;
}
model {
alpha ~ normal(0.0, 2.0);
rho ~ inv_gamma(0.5, 2.0);
sigma ~ normal(0.0, 1.0);
matrix[N, N] Ky = gp_exp_quad_cov(x, alpha, rho)
+ diag_matrix(rep_vector(square(sigma), N));
matrix[N, N] Ly = cholesky_decompose(Ky);
y ~ multi_normal_cholesky(rep_vector(0, N), Ly);
}
After adding the hyperparameters to the model, we compute the covariance matrix using cov_exp_quad()
, being sure to add \(\sigma^2\) to the diagonal. We then compute the Cholesky decomposition, since sampling out of a multi-Normal distribution using the Cholesky decomposition is more numerically stable.
Let’s compile the model and get our samples!
[4]:
data = dict(N=len(T_scaled), x=T_scaled, y=k_scaled)
# Compile and sample
with bebi103.stan.disable_logging():
sm = cmdstanpy.CmdStanModel(stan_file="gp_kinetics_no_ppc.stan")
samples = sm.sample(data=data)
# Convert to ArviZ
samples = az.from_cmdstanpy(samples)
# Check diagnostics
bebi103.stan.check_all_diagnostics(samples)
Effective sample size looks reasonable for all parameters.
Rhat looks reasonable for all parameters.
0 of 4000 (0.0%) iterations ended with a divergence.
0 of 4000 (0.0%) iterations saturated the maximum tree depth of 10.
E-BFMI indicated no pathological behavior.
[4]:
0
Everything looks good! Let’s check out the parameter values with a corner plot.
[5]:
bokeh.io.show(
bebi103.viz.corner(samples, parameters=["alpha", "rho", "sigma"])
)
Posterior predictive checks with GPs
We now have samples for \(\rho\) and \(\alpha\), which means we can calculate the nonparametric function \(f\). We could do this with Python, and indeed also compute the posterior predictive checks by drawing data out of the likelihood, but as we have seen, this is quite convenient when done using Stan.
Despite this convenience in post processing, Stan does not have built-in functions to compute \(\mathbf{m}_*\) and \(\mathsf{\Sigma}^*\) for a given data set and set of parameters/hyperparameters. We have to code this up in our Stan model.
Fortunately, the linear algebra manipulations involved in the calculation, as laid out in Lesson 24, can be performed efficiently using build-in Stan functions. In the Stan code below, I consider a one-dimensional \(\mathbf{X}\) (in the present example this is the array of temperatures). The functions gp_posterior_mstar()
and gp_posterior_sigmastar_cholesky()
to respectively compute the posterior \(\mathbf{m}_*\) values and the Cholesky
decomposition of \(\mathsf{\Sigma}_*\) given the kernel matrices \(\mathsf{K}_\mathbf{y}\), \(\mathsf{K}_*\) and \(\mathsf{K}_{**}\) (the functions actually use the Cholesky decomposition of \(\mathsf{K}_\mathbf{y}\)). I then use those functions in the generated quantities
block to sample the nonparametric function values \(\mathbf{f}_*\) and posterior predictive data.
functions {
vector gp_posterior_mstar(vector y, matrix Ly, matrix Kstar) {
/*
* Obtain posterior mstar for a model with a Normal likelihood and GP prior
* for a given Cholesky decomposition, Ly, of the matrix Ky, and K*.
*/
// Get sizes
int N = size(y);
int Nstar = cols(Kstar);
// Compute xi = inv(Ky) . y, which is solution xi to Ky . xi = y.
vector[N] z = mdivide_left_tri_low(Ly, y);
vector[N] xi = mdivide_right_tri_low(z', Ly)';
// Compute mean vector mstar
vector[Nstar] mstar = Kstar' * xi;
return mstar;
}
matrix gp_posterior_sigmastar_cholesky(
vector y,
matrix Ly,
matrix Kstar,
matrix Kstarstar,
real delta) {
/*
* Obtain posterior Σ* for a model with a Normal likelihood and GP prior.
*/
// Get sizes
int N = size(y);
int Nstar = cols(Kstar);
// Compute Xi = inv(Ky) . Kstar, which is the solution Xi to Ky . Xi = Kstar.
matrix[N, Nstar] Z = mdivide_left_tri_low(Ly, Kstar);
matrix[N, Nstar] Xi = mdivide_right_tri_low(Z', Ly)';
// Compute Sigma_star (plus a small number of the diagonal to ensure pos. def.)
matrix[Nstar, Nstar] Sigmastar = Kstarstar - Kstar' * Xi
+ diag_matrix(rep_vector(delta, Nstar));
// Compute and return Cholesky decomposition
matrix[Nstar, Nstar] Lstar = cholesky_decompose(Sigmastar);
return Lstar;
}
}
data {
int<lower=1> N;
array[N] real x;
vector[N] y;
int<lower=1> Nstar;
array[Nstar] real xstar;
}
transformed data {
real delta = 1e-8;
}
parameters {
real<lower=0> alpha;
real<lower=0> rho;
real<lower=0> sigma;
}
model {
alpha ~ normal(0.0, 2.0);
rho ~ inv_gamma(0.5, 2.0);
sigma ~ normal(0.0, 1.0);
matrix[N, N] Ky = gp_exp_quad_cov(x, alpha, rho)
+ diag_matrix(rep_vector(square(sigma), N));
matrix[N, N] Ly = cholesky_decompose(Ky);
y ~ multi_normal_cholesky(rep_vector(0, N), Ly);
}
generated quantities {
vector[Nstar] fstar;
array[Nstar] real y_ppc;
{
// Build covariance matrices
matrix[N, N] Ky = gp_exp_quad_cov(x, alpha, rho)
+ diag_matrix(rep_vector(square(sigma), N));
matrix[N, N] Ly = cholesky_decompose(Ky);
matrix[N, Nstar] Kstar = gp_exp_quad_cov(x, xstar, alpha, rho);
matrix[Nstar, Nstar] Kstarstar = gp_exp_quad_cov(xstar, xstar, alpha, rho);
// Obtain m* and Sigma*
vector[Nstar] mstar = gp_posterior_mstar(y, Ly, Kstar);
matrix[Nstar, Nstar] Lstar = gp_posterior_sigmastar_cholesky(
y, Ly, Kstar, Kstarstar, delta);
// Sample nonparametric function, f*
fstar = multi_normal_cholesky_rng(mstar, Lstar);
// Posterior predictive check
y_ppc = normal_rng(fstar, sigma);
}
}
The functions gp_posterior_mstar()
and gp_posterior_sigmastar_cholesky()
are generic for any one-dimensional \(x\) and \(y\) inputs, so you can copy and paste them into your Stan codes for GP models.
Let’s go ahead and compile and sample!
[6]:
# Add N* and x* to data dictionary
data = dict(**data, **dict(Nstar=Nstar, xstar=xstar))
with bebi103.stan.disable_logging():
sm = cmdstanpy.CmdStanModel(stan_file="gp_kinetics.stan")
samples = sm.sample(data=data)
# Convert to ArviZ
samples = az.from_cmdstanpy(samples, posterior_predictive=["fstar", "y_ppc"])
# Check diagnostics
bebi103.stan.check_all_diagnostics(samples)
Effective sample size looks reasonable for all parameters.
Rhat looks reasonable for all parameters.
0 of 4000 (0.0%) iterations ended with a divergence.
0 of 4000 (0.0%) iterations saturated the maximum tree depth of 10.
E-BFMI indicated no pathological behavior.
[6]:
0
We are again all good on the sampling. Let’s look at a plot of our samples of the nonparametric function \(f\). The distribution for each point in \(\mathbf{x}_*\) is no longer Normal as it was for a specific set of parameters \(\rho\), \(\alpha\), and \(\sigma\), so we compute the credible intervals for \(f\) using the samples. We can use the bebi103.viz.predictive_regression()
function for this, even though it this is not a posterior predictive plot, but a plot of
samples of the nonparametric function.
[7]:
fstar_scaled = (
samples.posterior_predictive["fstar"]
.stack({"sample": ("chain", "draw")})
.transpose("sample", "fstar_dim_0")
)
# Uncenter and unscale
fstar = k.std() * fstar_scaled + k.mean()
Tstar = T.std() * xstar + T.mean()
bokeh.io.show(
bebi103.viz.predictive_regression(
fstar,
Tstar,
data=np.stack((T, k)).transpose(),
color="orange",
data_kwargs=dict(line_color="#1f78b4", fill_color="#1f78b4"),
)
)
We also have posterior predictive samples, so we can compare those to the data.
[8]:
k_ppc_scaled = (
samples.posterior_predictive["y_ppc"]
.stack({"sample": ("chain", "draw")})
.transpose("sample", "y_ppc_dim_0")
)
# Uncenter and unscale
k_ppc = k.std() * k_ppc_scaled + k.mean()
bokeh.io.show(
bebi103.viz.predictive_regression(k_ppc, Tstar, data=np.stack((T, k)).transpose(),)
)
Very nice!
[9]:
bebi103.stan.clean_cmdstan()
Computing environment
[10]:
%load_ext watermark
%watermark -v -p numpy,scipy,pandas,cmdstanpy,arviz,bokeh,bebi103,jupyterlab
print("cmdstan :", bebi103.stan.cmdstan_version())
Python implementation: CPython
Python version : 3.11.5
IPython version : 8.15.0
numpy : 1.26.2
scipy : 1.11.4
pandas : 2.1.4
cmdstanpy : 1.2.0
arviz : 0.17.0
bokeh : 3.3.0
bebi103 : 0.1.20
jupyterlab: 4.0.10
cmdstan : 2.34.0