Configuring your machine


The instructions below discuss how to set up your machine. After setting it up, make sure you are always operating in the bebi103 environment.

Students who took BE/Bi 103 a last term

If you took BE/Bi 103 a last term (and only last term, Fall 2023), your computer is mostly configured. You need only to do the following on the command line after activating the bebi103 environment with conda.

pip install cmdstanpy==1.2.0 arviz==0.17.0 bebi103==0.1.18

After applying the above updates, you can skip to the Stan installation section and continue.

Students who did not take BE/Bi 103 a

If you did not take BE/Bi 103 a last term, complete Lesson 0 from BE/Bi 103 a, starting with the Installation on your own machine section.

Stan installation

We will be using Stan for much of our statistical modeling. Stan has a probabilistic programming language. Programs written in this language, called Stan programs, are translated into C++ by the Stan parser, and then the C++ code is compiled. As you will see throughout the class, there are many advantages to this approach.

There are many interfaces for Stan, including the two most widely used RStan and PyStan, which are R and Python interfaces, respectively. We will use a newer interface, CmdStanPy, which has several advantages that will become apparent when you start using it.

Whichever interface you use needs to have Stan installed and functional, which means you have to have an installed C++ toolchain. Installation and compilation can be tricky and varies from operating system to operating system. The instructions below are not guaranteed to work; you may have to do some troubleshooting on your own. Note that you can use Google Colab or AWS for computing as well, so you do not need to worry if you have trouble installing Stan locally.

Configuring a C++ toolchain for MacOS

On MacOS, you an install Xcode command line tools by running the following on the command line.

xcode-select --install

Configuring a C++ toolchain for Windows

According to theCmdStanPy documentation, you can skip this step, though I did previously verify that the below worked on a Windows machine.

You need to install a C++ toolchain for Windows. One possibility is to install a MinGW toolchain, and one way to do that is using conda.

conda install libpython m2w64-toolchain -c msys2

When you do this, make sure you are in the bebi103 environment.

Configuring a C++ toolchain for Linux

If you are using Linux, we assume you already have the C++ utilities installed.

Installing Stan with CmdStanPy

If you have a functioning C++ toolchain, you can use CmdStanPy to install Stan/CmdStan. You can do this by running the following at a Python prompt (either Python, IPython, or in a Jupyter notebook) (again making sure you are in the bebi103 environment).

import cmdstanpy; cmdstanpy.install_cmdstan()

This may take several minutes to run. (I did it on my Raspberry Pi, and it took hours.)

If you are using Windows and you skipped configuration of the C++ toolchain, instead run:

import cmdstanpy; cmdstanpy.install_cmdstan(compiler=True)

Checking your Stan installation

To check your Stan installation, you can run the following code. It will take several seconds for the model to compile and then sample. In the end, you should see a scatter plot of samples. You might not appreciate it yet, but this is a nifty demonstration of Stan’s power to sample hierarchical models, which is no trivial feat. You will see some warning text, and that is expected.

[1]:
import numpy as np

import bebi103
import cmdstanpy
import arviz as az

import bokeh.plotting
import bokeh.io
bokeh.io.output_notebook()

schools_data = {
    "J": 8,
    "y": [28, 8, -3, 7, -1, 1, 18, 12],
    "sigma": [15, 10, 16, 11, 9, 11, 10, 18],
}

schools_code = """
data {
  int<lower=0> J; // number of schools
  vector[J] y; // estimated treatment effects
  vector<lower=0>[J] sigma; // s.e. of effect estimates
}

parameters {
  real mu;
  real<lower=0> tau;
  vector[J] eta;
}

transformed parameters {
  vector[J] theta = mu + tau * eta;
}

model {
  eta ~ normal(0, 1);
  y ~ normal(theta, sigma);
}
"""

with open("schools_code.stan", "w") as f:
    f.write(schools_code)

sm = cmdstanpy.CmdStanModel(stan_file="schools_code.stan")
samples = sm.sample(data=schools_data, output_dir="./", show_progress=False)
samples = az.from_cmdstanpy(samples)
bebi103.stan.clean_cmdstan()

# Make a plot of samples
p = bokeh.plotting.figure(
    frame_height=250, frame_width=250, x_axis_label="μ", y_axis_label="τ"
)
p.circle(
    np.ravel(samples.posterior["mu"]),
    np.ravel(samples.posterior["tau"]),
    alpha=0.1
)

bokeh.io.show(p)
Loading BokehJS ...
21:19:57 - cmdstanpy - INFO - compiling stan file /Users/bois/Dropbox/git/bebi103_course/2024/b/content/lessons/00/schools_code.stan to exe file /Users/bois/Dropbox/git/bebi103_course/2024/b/content/lessons/00/schools_code
21:20:10 - cmdstanpy - INFO - compiled model executable: /Users/bois/Dropbox/git/bebi103_course/2024/b/content/lessons/00/schools_code
21:20:10 - cmdstanpy - INFO - CmdStan start processing
21:20:10 - cmdstanpy - INFO - Chain [1] start processing
21:20:10 - cmdstanpy - INFO - Chain [2] start processing
21:20:10 - cmdstanpy - INFO - Chain [3] start processing
21:20:10 - cmdstanpy - INFO - Chain [4] start processing
21:20:10 - cmdstanpy - INFO - Chain [1] done processing
21:20:10 - cmdstanpy - INFO - Chain [2] done processing
21:20:11 - cmdstanpy - INFO - Chain [3] done processing
21:20:11 - cmdstanpy - INFO - Chain [4] done processing
21:20:11 - cmdstanpy - WARNING - Some chains may have failed to converge.
        Chain 3 had 1 divergent transitions (0.1%)
        Chain 4 had 1 divergent transitions (0.1%)
        Use the "diagnose()" method on the CmdStanMCMC object to see further information.

Computing environment

[2]:
%load_ext watermark
%watermark -v -p numpy,bokeh,cmdstanpy,arviz,jupyterlab
print("CmdStan : {0:d}.{1:d}".format(*cmdstanpy.cmdstan_version()))
Python implementation: CPython
Python version       : 3.11.5
IPython version      : 8.15.0

numpy     : 1.26.2
bokeh     : 3.3.0
cmdstanpy : 1.2.0
arviz     : 0.17.0
jupyterlab: 4.0.10

CmdStan : 2.33