Implementation of a hierarchical model
[2]:
import numpy as np
import cmdstanpy
import arviz as az
import bebi103
import iqplot
import bokeh
colors = bokeh.palettes.d3['Category10'][10]
bokeh.io.output_notebook()
We will do MCMC on the hierarchical posterior for worm reversal data. As a reminder, the model is
\begin{align} &\phi \sim \text{Beta}(1.1, 1.1), \\[1em] &\kappa \sim \text{HalfNorm}(0, 1000), \\[1em] &\alpha = \phi \kappa, \\[1em] &\beta = (1-\phi)\kappa,\\[1em] &\theta_i \sim \text{Beta}(\alpha, \beta) \;\;\forall i,\\[1em] &n_i \sim \text{Binom}(N_i, \theta_i)\;\;\forall i. \end{align}
To demonstrate how the hierarchical model works, we will consider the data sets from 2015, 2016, and 2017, and an additional three synthetic experiments that have 14/40, 5/34, and 110/660 reversals, respectively. These three experiments were not actually performed; I am using them here to demonstrate some effects we see in hierarchical models. In particular, the last experiment has many more trials and would dominate pooled data if we were not using a hierarchical model.
We will not perform prior predictive checks or other diagnostics, but simply demonstrate how a hierarchical model is built using Stan and investigate some of the resulting inferences.
[3]:
# Data
n = np.array([9, 12, 18, 14, 5, 110])
N = np.array([35, 35, 54, 40, 34, 660])
Next, as usual, we define our model using Stan.
data {
// Number of separate experiments
int K;
array[K] int N;
array[K] int n;
}
parameters {
// Hyperparameters
real<lower=0, upper=1> phi;
real<lower=0> kappa;
// Parameters
array[K] real<lower=0, upper=1> theta;
}
transformed parameters {
// Transformed hyperparameters
real<lower=0> alpha = phi * kappa;
real<lower=0> beta_ = (1 - phi) * kappa;
}
model {
// Hyperpriors
phi ~ beta(1.1, 1.1);
kappa ~ normal(0, 1000.0);
// Prior
theta ~ beta(alpha, beta_);
// Likelihood
n ~ binomial(N, theta);
}
We can compile it and draw samples.
[4]:
data = dict(K=len(n), n=n, N=N)
with bebi103.stan.disable_logging():
sm = cmdstanpy.CmdStanModel(stan_file='worm_hier.stan')
samples = sm.sample(data=data, adapt_delta=0.99, iter_sampling=2000, thin=2)
samples = az.from_cmdstanpy(posterior=samples)
bebi103.stan.check_all_diagnostics(samples)
Effective sample size looks reasonable for all parameters.
Rhat looks reasonable for all parameters.
0 of 4000 (0.0%) iterations ended with a divergence.
0 of 4000 (0.0%) iterations saturated the maximum tree depth of 10.
E-BFMI indicated no pathological behavior.
[4]:
0
Now, let’s look a corner plot to see how we did with the sampling.
[5]:
g = bebi103.viz.corner(
samples,
parameters=["phi", "kappa", "theta[0]", "theta[1]", "theta[5]"],
xtick_label_orientation=np.pi / 4,
)
bokeh.io.show(g)
We can see the results more clearly if we plot the marginalized distributions on top of each other. This also allows us to see how the respective experiments serve to determine \(\phi\), the global reversal probability. I will also plot the most probable value of \(\theta_i\) in the independent experiment model (i.e., all experiments are independent) as a diamond. We also plot the most probable reversal probability of the pooled model (all experiments pooled together) as a black diamond, again with a Uniform prior.
[6]:
# Plot ECDF of phi samples
p = iqplot.ecdf(
samples.posterior.phi.values.flatten(),
frame_width=650,
style="staircase",
line_kwargs=dict(line_color="black"),
y_axis_label="marginal posterior CDF",
x_axis_label="parameter value",
legend_label="ϕ",
)
# Plot the ECDFs for each theta_i
for i in range(len(n)):
p = iqplot.ecdf(
samples.posterior.theta.sel(theta_dim_0=i).values.flatten(),
style="staircase",
line_kwargs=dict(line_color=colors[i]),
p=p,
legend_label=f"θ{i}"
)
# MAP of individuals (given by (n + α - 1) / (N + α + β - 2))
for i in range(len(n)):
theta_map = (n[i] + 0.1) / (N[i] + 0.2)
p.diamond(theta_map, 0.5, color=colors[i], size=20)
# Pooled MAP
theta_map = (n.sum() + 0.1) / (N.sum() + 0.2)
p.diamond(theta_map, 0.5, color="black", size=20)
p.legend.visible = True
p.legend.location = "bottom_right"
bokeh.io.show(p)
We see that the individual parameter values tend to “shrink” toward the hyperparameter value. The hyperparameter value, in turn, is different than if we pooled all the data together. Notably, the black diamond, which represents the MAP estimate of the reversal probability if we pooled all of the data together, is significantly different than under the hierarchical model. This is because the data set with many more measurements overwhelms the other data sets if we pool the results.
We are probably most interested in the hyperparameter \(\phi\), so let’s compute its median and 80% credible interval.
[7]:
print(
"""
ϕ = [{0:.3f}, {1:.3f}, {2:.3f}]
""".format(
*np.percentile(samples.posterior.phi.values.flatten(), [10, 50, 90])
)
)
ϕ = [0.201, 0.240, 0.292]
[8]:
bebi103.stan.clean_cmdstan()
Computing environment
[9]:
%load_ext watermark
%watermark -v -p numpy,cmdstanpy,arviz,iqplot,bebi103,bokeh,jupyterlab
Python implementation: CPython
Python version : 3.11.5
IPython version : 8.15.0
numpy : 1.26.2
cmdstanpy : 1.2.0
arviz : 0.17.0
iqplot : 0.3.5
bebi103 : 0.1.20
bokeh : 3.3.0
jupyterlab: 4.0.10