Reporting summaries of the posterior


[2]:
import numpy as np
import pandas as pd

import arviz as az

import iqplot

import bokeh.io
bokeh.io.output_notebook()
Loading BokehJS ...

Reporting summaries of MCMC samples

Performing MCMC calculations gives you full information about the posterior, which you can summarize in beautiful and informative corner plots. But very often, we wish to summarize the posterior in a few simple numbers. In particular, we wish to report a credible interval for a single parameter. This is computed from the marginalized posterior. The credible interval is a region in parameter space where we might expect a parameter value to lie. This credible interval is often reported and plotted as an error bar.

We will consider three commonly used ways of plotting a value plus error bar. We will use a 95% credible interval for this demonstration.

  1. mean ± standard deviation: The most commonly used credible interval is \(\mu \pm k\sigma\), where \(k\) is chosen to give the appropriate credible interval, approximating the marginalized posterior as Normal. We’ll do 95%, which means \(k = 1.96\).

  2. median with quantile: The posterior need not be Normal. If it is not, we would like a more robust way to summarize it. A simple method is to report the median, and then give lower and upper bounds to the error bar based on quantile. In our case, we would report the 2.5th percentile and the 97.5th percentile.

  3. mode with HPD: This method uses the highest posterior density interval, or HPD (also known as the HDI for highest density interval). If we’re considering a 95% credible interval, the HPD interval is the shortest interval that contains 95% of the probability mass of the posterior. So, we report the mode (the most probable parameter value) and then the bounds on the HPD interval.

  4. median with HPD: We report the HPD, but report the median parameter value from our samples instead of the mode.

To illustrate the relative merits of these reporting schemes, we’ll draw samples samples out of some artificial distributions.

Some distributions to sample

We will generate some distributions to sample. We will consider an exponential distribution, a Normal, the sum of two Normals, and a distribution with a long tail. We choose these to illustrate how various choices of the credible interval will be reported.

Let’s define the models and draw samples. MCMC is not necessary here for this illustrative exercise, so I will just draw the samples using Numpy.

[3]:
# Parametrize models
mu = 1.0
sigma = 0.25
mu_2 = np.array([mu, 3.0])
sigma_2 = np.array([sigma, 0.5])

# Instantiate RNG
rng = np.random.default_rng(seed=3252)

# Draw out of the distributions, start with exponential
x_expon = rng.exponential(mu, size=15000)

# Normal
x_norm = rng.normal(mu, sigma, size=15000)

# Bimodal mixture of Normals
which_norm = rng.choice([0, 1], 15000)
x_2norm = rng.normal(mu_2[which_norm], sigma_2[which_norm])

# Heavy tailed Pareto
x_heavytail = rng.pareto(1.2, size=15000)

# Store samples in a list for easy access
samples = [x_expon, x_norm, x_2norm, x_heavytail]

Let’s look at what we got by plotting the histograms.

[4]:
names = ["Exponential", "Normal", "Two Normals", "Heavy tail"]
plots = [
    iqplot.histogram(
        x,
        density=True,
        rug=False,
        frame_width=250,
        frame_height=200,
        x_axis_label="x",
        y_axis_label="pdf",
        title=t,
        line_kwargs={"line_width": 2, "line_color": "gray"},
    )
    for x, t in zip(samples[:-1], names[:-1])
]

# Make plot for heavy tail (needs different bins because of heavy tail)
plots.append(
    iqplot.histogram(
        samples[-1],
        bins=np.arange(21),
        density=True,
        rug=False,
        frame_width=250,
        frame_height=200,
        x_axis_label="x",
        y_axis_label="pdf",
        title="Heavy tail",
        line_kwargs={"line_width": 2, "line_color": "gray"},
    )
)

# Show in a grid
bokeh.io.show(bokeh.layouts.gridplot(plots, ncols=2))

Summarizing the “MCMC” results with error bars

We’ll start by computing the mean, standard deviation, median, and quantiles, since these are easy to compute with NumPy.

[5]:
# DataFrame to store summary stats
df_summary = pd.DataFrame(
    index=["mean", "std", "2.5", "median", "97.5", "mode", "hpd_low", "hpd_high"],
    columns=names,
)

for x, name in zip(samples, names):
    df_summary.loc["mean", name] = np.mean(x)
    df_summary.loc["std", name] = np.std(x)
    df_summary.loc[["2.5", "median", "97.5"], name] = np.percentile(x, [2.5, 50, 97.5])

Computation of the HPD is a little trickier. The idea is that we rank-order the MCMC trace. We know that the number of samples that are included in the HPD is 0.95 times the total number of MCMC samples. We then consider all intervals that contain that many samples and find the shortest one. This is accomplished using the ArviZ function az.hdi() function.

[6]:
for x, name in zip(samples, names):
    df_summary.loc[['hpd_low', 'hpd_high'], name] = az.hdi(x, hdi_prob=0.95)

Finally, we want to add the mode of our samples to our summary statistics. We could do that from MCMC samples using the log posterior values, and find the samples for which the log posterior is maximal. But since we quickly generated our samples using Numpy, we do not have those, so I will hand-code the known modes.

[7]:
df_summary.loc['mode', :] = [0, mu, mu, 1]

Now, we can take a look at our various summary statistics. (Of course, this data frame is not tidy, but remember, organization for data for display and for analysis are two different things. This data frame is organized for display.)

[8]:
df_summary
[8]:
Exponential Normal Two Normals Heavy tail
mean 1.007843 1.001447 2.004005 4.134071
std 1.005351 0.24914 1.07385 44.580247
2.5 0.024875 0.511552 0.590438 0.02307
median 0.701092 1.000698 1.767975 0.79814
97.5 3.649541 1.49205 3.804977 20.613608
mode 0 1.0 1.0 1
hpd_low 0.000088 0.531863 0.561362 0.000057
hpd_high 2.996549 1.507314 3.755653 11.296449

It is easier to visualize these summaries as error bars on plots. There’s a bit of code below to generate the plots, but bear with me.

[9]:
# y_values for bars on plots
y_vals = [
    [0.8, 0.6, 0.4, 0.2],  # Exponential
    [1.33, 1, 0.66, 0.33],  # Normal
    [0.6, 0.45, 0.3, 0.15],  # Two Normals
    [0.5, 0.4, 0.3, 0.2],  # Heavy tail
]

# Color scheme
colors = bokeh.palettes.Category10[10]


def plot_interval(x, y, barx, color, legend_label, p):
    p.circle([x], [y], size=10, color=color, legend_label=legend_label)
    p.line(barx, [y, y], line_width=4, color=color)


legends = ["mean ± std", "quantile", "mode/HPD", "median/HPD"]
for p, name, y in zip(plots, names, y_vals):
    # Mean ± std
    x = df_summary.loc["mean", name]
    barx = x + np.array([1, -1]) * 1.96 * df_summary.loc["std", name]
    plot_interval(x, y[0], barx, colors[0], legends[0], p)

    # Median with central 95% interval
    x = df_summary.loc["median", name]
    barx = df_summary.loc[["2.5", "97.5"], name]
    plot_interval(x, y[1], barx, colors[1], legends[1], p)

    # Mode with HPD
    x = df_summary.loc["mode", name]
    barx = df_summary.loc[["hpd_low", "hpd_high"], name]
    plot_interval(x, y[2], barx, colors[2], legends[2], p)

    # Median with HPD
    x = df_summary.loc["median", name]
    barx = df_summary.loc[["hpd_low", "hpd_high"], name]
    plot_interval(x, y[3], barx, colors[3], legends[3], p)

plots[0].legend.visible = True
for i in [1, 2, 3]:
    plots[i].legend.visible = False

bokeh.io.show(
    bokeh.layouts.column(bokeh.layouts.row(*plots[:2]), bokeh.layouts.row(*plots[2:]))
)

Relative merits of each method

  • The mean/std does not respect bounds on the posterior, nor any asymmetry. Unless we are going for speed and using a MAP finder/Normal approximation, there is no need for this method or summarizing the posterior.

  • The primary advantage of the quantile approach is that it is very easy to interpret, especially for the researcher uninitiated to Bayesian statistics. It does not suffer from the problems that the mean/std method does. It does not rely on any approximations.

  • The mode/HPD method gives just that: where the parameter value is most likely to fall, which is not necessarily the interquantile region with the median at its center. It may also be nice to know the most probable parameter value (the MAP), though this is seldom as useful as expectations. The drawback is the possible difficulty of interpretability for the uninitiated. Furthermore, I think it places too much emphasis on the MAP. This is not always the most relevant thing to know.

  • The median/HPD method is my personal favorite. It gives the HPD (with the advantages I just stated) and also the median, which to me is a more useful statistic than the mode.

In any case, attempting to describe a multi-modal posterior with an error bar is misleading and futile. A distribution with a long tail can also be deceiving. Even if you report a 95% credible interval, there is still a 5% chance the parameter value would be reeeeally big.

How to display the summary in text.

One issue that may be worrying you is how to report the asymmetric error bars in text. This is best seen by example. For the example of the Exponential, we may report the median with HPD as \(0.71^{+2.31}_{-0.70}\).

Computing environment

[10]:
%load_ext watermark
%watermark -v -p numpy,pandas,arviz,bokeh,iqplot,jupyterlab
Python implementation: CPython
Python version       : 3.11.5
IPython version      : 8.15.0

numpy     : 1.26.2
pandas    : 2.1.4
arviz     : 0.17.0
bokeh     : 3.3.0
iqplot    : 0.3.5
jupyterlab: 4.0.10