Homework 7.1: Normals with the same mean? (30 pts)

Data set download


In the last lesson of last term, we investigated how statistics from a frequentist analog of model comparison (null hypothesis significance tests) might vary from experiment to experiment. In that example, we used zebrafish embryo sleep data from the Prober lab. A description of their work on the genetic regulation of sleep can be found on the research page of the lab website. In particular, the movie below comes from their experiments watching moving/sleeping larvae over time.

The data we used are processed from raw data published in Gandhi et al., 2015. In their experiment they were studying the effect of a deletion in the gene coding for arylalkylamine N-acetyltransferase (aanat), which is a key enzyme in the rhythmic production of melatonin. Melatonin is a hormone responsible for regulation of circadian rhythms. It is often taken as a drug to treat sleep disorders. The goal of this study is to investigate the effects of aanat deletion on sleep pattern in 5+ day old zebrafish larvae.

Among other sleep properties, they measured the mean rest bout length on the sixth night, comparing wild type larvae to the homozygous mutant. A rest bout is defined as a period of time in which the fish does not move. The length of a rest bout is just the amount of time the fish is still. We are primarily interested in the difference in the mean bout lengths between the two genotypes. The processed data are found here.

We will take a Bayesian approach to model comparison in this problem.

a) Write down two models for the mean rest bout length for mutant and wild type fish. In the first model, the rest bout lengths are Normally distributed with the same location parameter \(\mu\), but different scale parameters \(\sigma\). In the second model, the rest bout lengths are Normally distributed, but with different location and scale parameters. Choose appropriate priors.

b) Perform parameter estimates for each model and perform a model comparison using either LOO of WAIC. (In doing the calculation, you may get some warnings about stability of the calculation.)

c) I generally find that model comparison should be avoided. This is because we seldom actually want to compare two models models; this is seldom the research question we want to answer. In this case, I do not think the question is whether the two data sets come from Normal distributions with different location parameters. Of course they come from different distributions; they are different genotypes. The more pertinent question is how different are they? Address this question instead.