Homework 10.1: Caulobacter growth: linear or exponential? (100 pts)

Data set download


You can do either this problem or Homework 10.2.

In this problem, you will use hierarchical Bayesian modeling to perform parameter estimation of the growth rates of the individual mother Caulobacter cells and also determine if bacterial growth on a single cell level is linear or exponential. (See discussion below; you did this with a frequentist approach in Homework 9.2 of last term.) You can download the data set here.

We know that under ideal conditions, bacterial cells experience exponential growth in bulk. That is, the number of cells grows exponentially. This is possible regardless of how individual cells growth; the repeated divisions lead to exponential growth. In their paper, the authors argue that the growth rate of each cell is also exponential. I.e.,

\begin{align} a(t) = a_0 \mathrm{e}^{k t}, \end{align}

where \(a(t)\) is the area of the cell in the image as a function of time and \(a_0\) is the area of the cell right after a division has been completed, which we mark as \(t = 0\).

As an alternative model, the authors consider a linear growth model, in which

\begin{align} a(t) = a_0 + b t. \end{align}

An exponential curve is approximately linear (with \(b = a_0k\)) for short time scales. So, it is often difficult to distinguish between a linear and an exponential growth. Your goal is to perform parameter estimates and do an effective comparison between these two models for growth. You should use hierarchical models, and be sure to take a principled approach in your model construction and evaluation.

Since you are using a hierarchical model, here are a few tips for building and implementing the models. You do not need to take this advice if you do not want to, but I have found that these strategies help.

  1. Think carefully about your hyperpriors. If you choose an uninformative hyperprior for a level of the model that is data poor, you end up underpooling. For example, in this problem, there are only two mother cells. So, there are only two Caulobacter samples in your data set. If you put a broad prior on the growth rate of Caulobacter cells, these two cells can be effectively decoupled.

  2. The hierarchical structure can make things difficult to code up and make it harder to hunt down bugs. As I’m building my hierarchical model, often approach it with “baby steps.” I like to start off with a non-hierarchical model, often with a subset of the data. I perform sampling on this simpler model, taking a small number of samples so I do not have to wait for too long. After making sure everything is ok with this simpler structure, I then add a level to the hierarchy. I again do the sampling with a subset of the data, make sure everything works ok, and then add the next level of hierarchy, and so on. I find this helps me find bugs and little details along the way.

  3. You will probably encounter the funnel of hell, so you should strongly consider using noncentered parametrizations.

  4. When you sample out of the full hierarchical model, the sampler may be slower than you are used to seeing. It will likely also be much slower than sampling out of the non-hierarchical model, even though there are only a few more parameters. Stan may also be particularly slow during the warmup phase. You may see it progress taking a few seconds per iteration. This is expected.

  5. Finally, just to give you a sense of what kind of computation time you might expect, for my hierarchical model, it took many hours to do the sampling on a c5.xlarge instance on AWS.