Modeling repeated experiments
In this lesson, we will investigate hierarchical models, in which some model parameters are dependent on others in specific ways. As is often the case, this is perhaps best learned by example.
In Homework 4.2, we studied reversals under exposure to blue light in C. elegans with Channelrhodopsin in two different neurons. Let’s consider one of the strains which contains a Channelrhodopsin in the ASH sensory neuron. We considered data done in three different years by the students of Bi 1x. In 2015, we found that 9 out of 35 worms reversed under exposure to blue light. In 2016, 12 out of 35 reversed. In 2017, 18 out of 54 reversed.
A model for reversals
As we have worked out, a Binomial likelihood makes sense for this experiment; the number \(n\) out of \(N\) Bernoulli trials that are successful is Binomially distributed. The parameter we are trying to estimate, the probability of reversal \(\theta\), needs a prior. A Beta distribution makes sense for this, and we will choose a Beta distribution that weakly disfavors a zero or one probability of reversal.
\begin{align} &\theta \sim \text{Beta}(1.1, 1.1),\\[1em] &n \sim \text{Binomial}(N, \theta). \end{align}
The problem is that this is a generative model for a single experiment. We did the experiment in 2015, getting \(n/N = 9/35\), and again in 2016, getting \(n/N = 12/35\), and in 2017 with \(n/N = 18/54\). Actually, we could imagine doing the experiment over and over again, say \(k\) times, each time getting a value of \(n\) and \(N\). Conditions may change from experiment to experiment. For example, we may have different lighting set-ups, slight differences in the strain of worms we’re using, etc. We are left with some choices on how to model the data.
Pooled data: identical parameters
We could pool all of the data together. In other words, let’s say we measure \(n_1\) out of \(N_1\) reversals in the first set of experiments, \(n_2\) out of \(N_2\) reversals in the second set, etc., up to \(k\) total experiments. We could pool all of the data together to get
\begin{align} &n = \sum_{i=1}^k n_i \text{ out of } N = \sum_{i=1}^k N_i \text{ reversals}. \end{align}
We then compute our posterior as in the model above. Here, the modeling assumption is that the result in each experiment are governed by identical parameters. That is to say that we assume \(\theta_1 = \theta_2 = \cdots = \theta_k = \theta\). This is the approach we took in Homework 4.2.
Independent parameters
As an alternative model, we could instead say that the parameters in each experiment are totally independent of each other. In this case, we assume that \(\theta_1\), \(\theta_2\), \(\ldots\), \(\theta_k\) are all independent of each other. In this case, we have \(k\) separate models to fit, each looking like
\begin{align} &\theta_i \sim \text{Beta}(1.1, 1.1),\\[1em] &n_i \sim \text{Binomial}(N_i, \theta_i). \end{align}
When we do the modeling in this way, we often report a value of \(\theta\) that is given by the mean of the \(\theta_i\)’s with some error bar.
The best of both worlds: A hierarchical model
Each of these extremes have their advantages. We are often trying to estimate a parameter that is more universal than our experiments, e.g., something that describes worms with Channelrhodopsin in the ASH neuron generally. So, pooling the experiments makes sense. On the other hand, we have reason to assume that there is going to be a different value of \(\theta\) in different experiments, as biological systems are highly variable, not to mention measurement variations. So, how can we capture both of these effects?
We can consider a model in which there is a “global” reversal probability, which we will call \(\phi\), and the values of \(\theta_i\) may vary from this \(\phi\) according to some probability distribution, \(g(\theta_i\mid \phi)\). So now we have parameters \(\theta_1, \theta_2, \ldots, \theta_k\) and \(\phi\). So, the posterior can be written using Bayes’s theorem, defining \(\theta = (\theta_1, \theta_2, \ldots)\), \(N = (N_1, N_2, \ldots)\), and \(n = (n_1, n_2, \ldots)\),
\begin{align} g(\phi,\theta\mid n, N) = \frac{f(n,N\mid \phi, \theta)\, g(\phi, \theta)}{f(n, N)}. \end{align}
Note, though, that the observed values of \(n\) do not depend directly on \(\phi\), only on \(\theta\). In other words, the observations are only indirectly dependent on \(\phi\). So, we can write \(f(n,N\mid \phi, \theta) = f(n,N\mid \theta)\). Thus, we have
\begin{align} g(\phi,\theta\mid n, N) = \frac{f(n,N\mid \theta)\,g(\phi, \theta)}{f(n, N)}. \end{align}
Next, we can rewrite the prior using the definition of conditional probability.
\begin{align} g(\phi,\theta) = g(\theta\mid \phi)\, g(\phi). \end{align}
Substituting this back into our expression for the posterior, we have
\begin{align} g(\phi,\theta\mid n, N) = \frac{f(n,N\mid \theta)\, g(\theta\mid \phi)\, g(\phi)}{f(n, N)}. \end{align}
Now, if we read off the numerator of this equation, we see a chain of dependencies (conditioning). The experimental results \(n\) are conditioned on parameters \(\theta\). Parameters \(\theta\) are conditioned on hyperparameter \(\phi\). Hyperparameter \(\phi\) then has some hyperprior distribution. Any model that can be written as a chain of conditioning like this is called a hierarchical model, and the parameters that do not directly condition the data are called hyperparameters.
So, the hierarchical model captures both the experiment-to-experiment variability, as well as the “global” regulator of outcomes, the hyperparameter. Note that the product \(g(\theta\mid \phi)\, g(\phi)\) comprises the prior, as it is independent of the observed data.