MCMC with GPs with Normal likelihoods

Data set download


[2]:
import numpy as np
import polars as pl
import scipy.optimize
import scipy.stats as st

import cmdstanpy
import arviz as az

import bebi103

import bokeh.io
bokeh.io.output_notebook()
Loading BokehJS ...

In the previous lesson, we found MAP estimates for the hyperparameters and \(\sigma\) parameters of a model with a GP prior and a Normal likelihood. Here, we estimate those parameter with Markov chain Monte Carlo.

As a reminder, the general model we are considering is

\begin{align} &\theta_k \sim \text{some hyperprior}\\[1em] &\boldsymbol{\sigma} \sim \text{some prior}\\[1em] &\mathbf{y} \mid \mathbf{X}, \boldsymbol{\sigma}, \theta_k \sim \mathrm{MultiNorm}(\mathbf{0}, \mathsf{K}_\mathbf{y}), \end{align}

where we have marginalized out the latent variables \(f(\mathbf{X})\). (See the previous lesson for the definition of \(\mathsf{K}_\mathbf{y}\).)

We are specifically modeling the data set from Wolfenden and Snider, which can be downloaded here. The model we will consider here uses a SE kernel.

\begin{align} &\alpha \sim \text{HalfNorm}(2)\\[1em] &\rho \sim \text{InvGamma}(0.5, 2)\\[1em] &\sigma \sim \text{HalfNorm}(0.1)\\[1em] &\mathbf{y} \mid \mathbf{X}, \sigma, \alpha, \rho \sim \mathrm{MultiNorm}(\mathbf{0}, \mathsf{K}_\mathbf{y}), \end{align}

where \(\mathbf{X} = \mathbf{T}\), a set of temperatures and \(\mathbf{y} = \mathbf{k}\), a set of chemical rate constants.

Let’s load in the data set and scale and center it before we proceed to estimating the parameters of the model.

[3]:
# Load data
df = pl.read_csv(os.path.join(data_path, 'wolfenden_arrhenius.csv'))
df = df.with_columns(
    (1000 / pl.col('1000/T (1/K)')).alias('T (K)'),
    (pl.col('ln k (1/s)').exp()).alias('k (1/s)')
)

T = df["T (K)"].to_numpy()
k = df["k (1/s)"].to_numpy()

# Center and scale
k_scaled = (k - k.mean()) / k.std()
T_scaled = (T - T.mean()) / T.std()

# Sample at 250 points
Nstar = 250

# Set up xstar
T_range = T_scaled.max() - T_scaled.min()
xstar = np.linspace(
    T_scaled.min() - 0.05 * T_range, T_scaled.max() + 0.05 * T_range, Nstar
)

Hyperparameter estimation using MCMC

Estimation of the hyperparameters is as simple as coding up the model in Stan. Below is the Stan code.

data {
  int<lower=1> N;
  array[N] real x;
  vector[N] y;
}


parameters {
  real<lower=0> alpha;
  real<lower=0> rho;
  real<lower=0> sigma;
}


model {
  alpha ~ normal(0.0, 2.0);
  rho ~ inv_gamma(0.5, 2.0);
  sigma ~ normal(0.0, 1.0);

  matrix[N, N] Ky = gp_exp_quad_cov(x, alpha, rho)
                   + diag_matrix(rep_vector(square(sigma), N));
  matrix[N, N] Ly = cholesky_decompose(Ky);

  y ~ multi_normal_cholesky(rep_vector(0, N), Ly);
}

After adding the hyperparameters to the model, we compute the covariance matrix using cov_exp_quad(), being sure to add \(\sigma^2\) to the diagonal. We then compute the Cholesky decomposition, since sampling out of a multi-Normal distribution using the Cholesky decomposition is more numerically stable.

Let’s compile the model and get our samples!

[4]:
data = dict(N=len(T_scaled), x=T_scaled, y=k_scaled)

# Compile and sample
with bebi103.stan.disable_logging():
    sm = cmdstanpy.CmdStanModel(stan_file="gp_kinetics_no_ppc.stan")
    samples = sm.sample(data=data)

# Convert to ArviZ
samples = az.from_cmdstanpy(samples)

# Check diagnostics
bebi103.stan.check_all_diagnostics(samples)

Effective sample size looks reasonable for all parameters.

Rhat looks reasonable for all parameters.

0 of 4000 (0.0%) iterations ended with a divergence.

0 of 4000 (0.0%) iterations saturated the maximum tree depth of 10.

E-BFMI indicated no pathological behavior.
[4]:
0

Everything looks good! Let’s check out the parameter values with a corner plot.

[5]:
bokeh.io.show(
    bebi103.viz.corner(samples, parameters=["alpha", "rho", "sigma"])
)

Posterior predictive checks with GPs

We now have samples for \(\rho\) and \(\alpha\), which means we can calculate the nonparametric function \(f\). We could do this with Python, and indeed also compute the posterior predictive checks by drawing data out of the likelihood, but as we have seen, this is quite convenient when done using Stan.

Despite this convenience in post processing, Stan does not have built-in functions to compute \(\mathbf{m}_*\) and \(\mathsf{\Sigma}^*\) for a given data set and set of parameters/hyperparameters. We have to code this up in our Stan model.

Fortunately, the linear algebra manipulations involved in the calculation, as laid out in Lesson 24, can be performed efficiently using build-in Stan functions. In the Stan code below, I consider a one-dimensional \(\mathbf{X}\) (in the present example this is the array of temperatures). The functions gp_posterior_mstar() and gp_posterior_sigmastar_cholesky() to respectively compute the posterior \(\mathbf{m}_*\) values and the Cholesky decomposition of \(\mathsf{\Sigma}_*\) given the kernel matrices \(\mathsf{K}_\mathbf{y}\), \(\mathsf{K}_*\) and \(\mathsf{K}_{**}\) (the functions actually use the Cholesky decomposition of \(\mathsf{K}_\mathbf{y}\)). I then use those functions in the generated quantities block to sample the nonparametric function values \(\mathbf{f}_*\) and posterior predictive data.

functions {
  /**
   * Return posterior m* for a model with a Normal likelihood and GP
   * prior for a given Cholesky decomposition, Ly, of the matrix Ky,
   * and K*.
   *
   * @param y y-values of data
   * @param Ly Cholesky decomposition of the matrix Ly
   * @param Kstar Matrix K*
   * @return Posterior m*
   */
  vector gp_posterior_mstar(vector y, matrix Ly, matrix Kstar) {
    // Get sizes
    int N = size(y);
    int Nstar = cols(Kstar);

    // Compute xi = inv(Ky) . y, which is solution xi to Ky . xi = y.
    vector[N] z = mdivide_left_tri_low(Ly, y);
    vector[N] xi = mdivide_right_tri_low(z', Ly)';

    // Compute mean vector mstar
    vector[Nstar] mstar = Kstar' * xi;

    return mstar;
  }


  /**
   * Return Cholesky decomposition of posterior Σ* for a model with a
   * Normal likelihood and GP prior.
   *
   * @param y y-values of data
   * @param Ly Cholesky decomposition of the matrix Ly
   * @param Kstar Matrix K*
   * @param Kstarstar Matrix K**
   * @param delta Small value to add to the diagonal of Σ* to ensure
   *              numerical positive definiteness
   * @return Cholesky decomposition of posterior Σ*
   */
  matrix gp_posterior_sigmastar_cholesky(
      vector y,
      matrix Ly,
      matrix Kstar,
      matrix Kstarstar,
      real delta) {

    // Get sizes
    int N = size(y);
    int Nstar = cols(Kstar);

    // Compute Xi = inv(Ky) . Kstar, which is the solution Xi to Ky . Xi = Kstar.
    matrix[N, Nstar] Z = mdivide_left_tri_low(Ly, Kstar);
    matrix[N, Nstar] Xi = mdivide_right_tri_low(Z', Ly)';

    // Compute Sigma_star (plus a small number of the diagonal to ensure pos. def.)
    matrix[Nstar, Nstar] Sigmastar = Kstarstar - Kstar' * Xi
                                   + diag_matrix(rep_vector(delta, Nstar));

    // Compute and return Cholesky decomposition
    matrix[Nstar, Nstar] Lstar = cholesky_decompose(Sigmastar);

    return Lstar;
  }
}

data {
  int<lower=1> N;
  array[N] real x;
  vector[N] y;

  int<lower=1> Nstar;
  array[Nstar] real xstar;
}


transformed data {
  real delta = 1e-8;
}


parameters {
  real<lower=0> alpha;
  real<lower=0> rho;
  real<lower=0> sigma;
}


model {
  alpha ~ normal(0.0, 2.0);
  rho ~ inv_gamma(0.5, 2.0);
  sigma ~ normal(0.0, 1.0);

  matrix[N, N] Ky = gp_exp_quad_cov(x, alpha, rho)
                   + diag_matrix(rep_vector(square(sigma), N));
  matrix[N, N] Ly = cholesky_decompose(Ky);

  y ~ multi_normal_cholesky(rep_vector(0, N), Ly);
}


generated quantities {
  vector[Nstar] fstar;
  array[Nstar] real y_ppc;

  {
    // Build covariance matrices
    matrix[N, N] Ky = gp_exp_quad_cov(x, alpha, rho)
                     + diag_matrix(rep_vector(square(sigma), N));
    matrix[N, N] Ly = cholesky_decompose(Ky);
    matrix[N, Nstar] Kstar = gp_exp_quad_cov(x, xstar, alpha, rho);

    matrix[Nstar, Nstar] Kstarstar = gp_exp_quad_cov(xstar, xstar, alpha, rho);

    // Obtain m* and Sigma*
    vector[Nstar] mstar = gp_posterior_mstar(y, Ly, Kstar);
    matrix[Nstar, Nstar] Lstar = gp_posterior_sigmastar_cholesky(
      y, Ly, Kstar, Kstarstar, delta);

    // Sample nonparametric function, f*
    fstar = multi_normal_cholesky_rng(mstar, Lstar);

    // Posterior predictive check
    y_ppc = normal_rng(fstar, sigma);
  }

}

Aside: Stan include files

Note that in the above Stan code, the functions gp_posterior_mstar() and gp_posterior_sigmastar_cholesky() are generic for any one-dimensional \(x\) and \(y\) inputs. These might be useful to employ in other GPs you are modeling, so it would be nice not to have to copy and paste these functions. To facilitate use of functions that you may use Stan’s include functionality. Say we have a file, gp_one_dimensional.stan that has the following contents.

/*
  These functions are for GPs with univariate x.
 */

  /**
   * Return posterior m* for a model with a Normal likelihood and GP
   * prior for a given Cholesky decomposition, Ly, of the matrix Ky,
   * and K*.
   *
   * @param y y-values of data
   * @param Ly Cholesky decomposition of the matrix Ly
   * @param Kstar Matrix K*
   * @return Posterior m*
   */
  vector gp_posterior_mstar(vector y, matrix Ly, matrix Kstar) {
    // Get sizes
    int N = size(y);
    int Nstar = cols(Kstar);

    // Compute xi = inv(Ky) . y, which is solution xi to Ky . xi = y.
    vector[N] z = mdivide_left_tri_low(Ly, y);
    vector[N] xi = mdivide_right_tri_low(z', Ly)';

    // Compute mean vector mstar
    vector[Nstar] mstar = Kstar' * xi;

    return mstar;
  }


  /**
   * Return Cholesky decomposition of posterior Σ* for a model with a
   * Normal likelihood and GP prior.
   *
   * @param y y-values of data
   * @param Ly Cholesky decomposition of the matrix Ly
   * @param Kstar Matrix K*
   * @param Kstarstar Matrix K**
   * @param delta Small value to add to the diagonal of Σ* to ensure
   *              numerical positive definiteness
   * @return Cholesky decomposition of posterior Σ*
   */
  matrix gp_posterior_sigmastar_cholesky(
      vector y,
      matrix Ly,
      matrix Kstar,
      matrix Kstarstar,
      real delta) {

    // Get sizes
    int N = size(y);
    int Nstar = cols(Kstar);

    // Compute Xi = inv(Ky) . Kstar, which is the solution Xi to Ky . Xi = Kstar.
    matrix[N, Nstar] Z = mdivide_left_tri_low(Ly, Kstar);
    matrix[N, Nstar] Xi = mdivide_right_tri_low(Z', Ly)';

    // Compute Sigma_star (plus a small number of the diagonal to ensure pos. def.)
    matrix[Nstar, Nstar] Sigmastar = Kstarstar - Kstar' * Xi
                                   + diag_matrix(rep_vector(delta, Nstar));

    // Compute and return Cholesky decomposition
    matrix[Nstar, Nstar] Lstar = cholesky_decompose(Sigmastar);

    return Lstar;
  }

We can include the contents in this file in our Stan file that has our model using the

#include gp_one_dimensional.stan

syntax. Wherever the #include statement is, that line is replaced with the contents of the file. Note that I do not encase the contents of the file in functions { }, as is commonly done for files included in Stan programs (as in the documentation on includes). I do this so that multiple files can be included, and Stan files strictly allow for only one functions block, one model block, etc. Note also that because there is no functions or other block, the gp_one_dimensional.stan files does not contain a complete valid Stan code, but we nonetheless use the .stan suffix as is conventional.

When compiling the model, you need to be sure to tell CmdStanPy where to find the file using the stanc_options keyword argument. This kwarg tells the Stan compiler paths to search for files to include. For example, if you had a directory called stan_include in your home directory that you wanted the Stan compiler to search for includes, and you also wanted Stan to search the working directory, you could have the following.

stanc_options = {'include-path': '~/stan_include,./'}

Note that the paths to the respective directories are comma separated.

Let’s demonstrate. The file gp_kinetics.stan in the working directory has the following contents.

functions {
  #include gp_one_dimensional.stan
}

data {
  int<lower=1> N;
  array[N] real x;
  vector[N] y;

  int<lower=1> Nstar;
  array[Nstar] real xstar;
}


transformed data {
  real delta = 1e-8;
}


parameters {
  real<lower=0> alpha;
  real<lower=0> rho;
  real<lower=0> sigma;
}


model {
  alpha ~ normal(0.0, 2.0);
  rho ~ inv_gamma(0.5, 2.0);
  sigma ~ normal(0.0, 1.0);

  matrix[N, N] Ky = gp_exp_quad_cov(x, alpha, rho)
                   + diag_matrix(rep_vector(square(sigma), N));
  matrix[N, N] Ly = cholesky_decompose(Ky);

  y ~ multi_normal_cholesky(rep_vector(0, N), Ly);
}


generated quantities {
  vector[Nstar] fstar;
  array[Nstar] real y_ppc;

  {
    // Build covariance matrices
    matrix[N, N] Ky = gp_exp_quad_cov(x, alpha, rho)
                     + diag_matrix(rep_vector(square(sigma), N));
    matrix[N, N] Ly = cholesky_decompose(Ky);
    matrix[N, Nstar] Kstar = gp_exp_quad_cov(x, xstar, alpha, rho);

    matrix[Nstar, Nstar] Kstarstar = gp_exp_quad_cov(xstar, xstar, alpha, rho);

    // Obtain m* and Sigma*
    vector[Nstar] mstar = gp_posterior_mstar(y, Ly, Kstar);
    matrix[Nstar, Nstar] Lstar = gp_posterior_sigmastar_cholesky(
      y, Ly, Kstar, Kstarstar, delta);

    // Sample nonparametric function, f*
    fstar = multi_normal_cholesky_rng(mstar, Lstar);

    // Posterior predictive check
    y_ppc = normal_rng(fstar, sigma);
  }

}

Let’s go ahead and compile.

[6]:
with bebi103.stan.disable_logging():
    sm = cmdstanpy.CmdStanModel(
        stan_file="gp_kinetics.stan",
        stanc_options = {'include-paths': './'}
    )

It compiles just fine!

The bebi103 package has Stan functions available for Gaussian processes, including the above two functions and also some functions for differentiating squared exponential kernels that we will use in a subsequent lesson. To include those, you need to fetch the pathto the Stan files. You can do this using bebi103.stan.include_path(). We will use these functions going forward. Let’s compile and sample!

[7]:
# Add N* and x* to data dictionary
data = dict(**data, **dict(Nstar=Nstar, xstar=xstar))

with bebi103.stan.disable_logging():
    sm = cmdstanpy.CmdStanModel(
        stan_file="gp_kinetics.stan",
        stanc_options = {'include-paths': bebi103.stan.include_path()},
        force_compile=True,
    )
    samples = sm.sample(data=data)

# Convert to ArviZ
samples = az.from_cmdstanpy(samples, posterior_predictive=["fstar", "y_ppc"])

# Check diagnostics
bebi103.stan.check_all_diagnostics(samples)

Effective sample size looks reasonable for all parameters.

Rhat looks reasonable for all parameters.

0 of 4000 (0.0%) iterations ended with a divergence.

0 of 4000 (0.0%) iterations saturated the maximum tree depth of 10.

E-BFMI indicated no pathological behavior.
[7]:
0

We are again all good on the sampling. Let’s look at a plot of our samples of the nonparametric function \(f\). The distribution for each point in \(\mathbf{x}_*\) is no longer Normal as it was for a specific set of parameters \(\rho\), \(\alpha\), and \(\sigma\), so we compute the credible intervals for \(f\) using the samples. We can use the bebi103.viz.predictive_regression() function for this, even though it this is not a posterior predictive plot, but a plot of samples of the nonparametric function.

[8]:
fstar_scaled = (
    samples.posterior_predictive["fstar"]
    .stack({"sample": ("chain", "draw")})
    .transpose("sample", "fstar_dim_0")
)

# Uncenter and unscale
fstar = k.std() * fstar_scaled + k.mean()
Tstar = T.std() * xstar + T.mean()

bokeh.io.show(
    bebi103.viz.predictive_regression(
        fstar,
        Tstar,
        data=np.stack((T, k)).transpose(),
        color="orange",
        data_kwargs=dict(line_color="#1f78b4", fill_color="#1f78b4"),
    )
)

We also have posterior predictive samples, so we can compare those to the data.

[9]:
k_ppc_scaled = (
    samples.posterior_predictive["y_ppc"]
    .stack({"sample": ("chain", "draw")})
    .transpose("sample", "y_ppc_dim_0")
)

# Uncenter and unscale
k_ppc = k.std() * k_ppc_scaled + k.mean()

bokeh.io.show(
    bebi103.viz.predictive_regression(k_ppc, Tstar, data=np.stack((T, k)).transpose(),)
)

Very nice!

[10]:
bebi103.stan.clean_cmdstan()

Computing environment

[11]:
%load_ext watermark
%watermark -v -p numpy,scipy,polars,cmdstanpy,arviz,bokeh,bebi103,jupyterlab
print("cmdstan   :", bebi103.stan.cmdstan_version())
Python implementation: CPython
Python version       : 3.12.8
IPython version      : 8.27.0

numpy     : 1.26.4
scipy     : 1.14.1
polars    : 1.17.1
cmdstanpy : 1.2.5
arviz     : 0.20.0
bokeh     : 3.6.0
bebi103   : 0.1.25
jupyterlab: 4.2.6

cmdstan   : 2.36.0