Configuring your machine
The instructions below discuss how to set up your machine. After setting it up, make sure you are always operating in the bebi103
environment.
Students who took BE/Bi 103 a last term
If you took BE/Bi 103 a last term (and only last term, Fall 2023), your computer is mostly configured. You need only to do the following on the command line after activating the bebi103
environment with conda.
pip install cmdstanpy==1.2.0 arviz==0.17.0 bebi103==0.1.18
After applying the above updates, you can skip to the Stan installation section and continue.
Students who did not take BE/Bi 103 a
If you did not take BE/Bi 103 a last term, complete Lesson 0 from BE/Bi 103 a, starting with the Installation on your own machine
section.
Stan installation
We will be using Stan for much of our statistical modeling. Stan has a probabilistic programming language. Programs written in this language, called Stan programs, are translated into C++ by the Stan parser, and then the C++ code is compiled. As you will see throughout the class, there are many advantages to this approach.
There are many interfaces for Stan, including the two most widely used RStan and PyStan, which are R and Python interfaces, respectively. We will use a newer interface, CmdStanPy, which has several advantages that will become apparent when you start using it.
Whichever interface you use needs to have Stan installed and functional, which means you have to have an installed C++ toolchain. Installation and compilation can be tricky and varies from operating system to operating system. The instructions below are not guaranteed to work; you may have to do some troubleshooting on your own. Note that you can use Google Colab or AWS for computing as well, so you do not need to worry if you have trouble installing Stan locally.
Configuring a C++ toolchain for MacOS
On MacOS, you an install Xcode command line tools by running the following on the command line.
xcode-select --install
Configuring a C++ toolchain for Windows
According to theCmdStanPy documentation, you can skip this step, though I did previously verify that the below worked on a Windows machine.
You need to install a C++ toolchain for Windows. One possibility is to install a MinGW toolchain, and one way to do that is using conda
.
conda install libpython m2w64-toolchain -c msys2
When you do this, make sure you are in the bebi103
environment.
Configuring a C++ toolchain for Linux
If you are using Linux, we assume you already have the C++ utilities installed.
Installing Stan with CmdStanPy
If you have a functioning C++ toolchain, you can use CmdStanPy to install Stan/CmdStan. You can do this by running the following at a Python prompt (either Python, IPython, or in a Jupyter notebook) (again making sure you are in the bebi103
environment).
import cmdstanpy; cmdstanpy.install_cmdstan()
This may take several minutes to run. (I did it on my Raspberry Pi, and it took hours.)
If you are using Windows and you skipped configuration of the C++ toolchain, instead run:
import cmdstanpy; cmdstanpy.install_cmdstan(compiler=True)
Checking your Stan installation
To check your Stan installation, you can run the following code. It will take several seconds for the model to compile and then sample. In the end, you should see a scatter plot of samples. You might not appreciate it yet, but this is a nifty demonstration of Stan’s power to sample hierarchical models, which is no trivial feat. You will see some warning text, and that is expected.
[1]:
import numpy as np
import bebi103
import cmdstanpy
import arviz as az
import bokeh.plotting
import bokeh.io
bokeh.io.output_notebook()
schools_data = {
"J": 8,
"y": [28, 8, -3, 7, -1, 1, 18, 12],
"sigma": [15, 10, 16, 11, 9, 11, 10, 18],
}
schools_code = """
data {
int<lower=0> J; // number of schools
vector[J] y; // estimated treatment effects
vector<lower=0>[J] sigma; // s.e. of effect estimates
}
parameters {
real mu;
real<lower=0> tau;
vector[J] eta;
}
transformed parameters {
vector[J] theta = mu + tau * eta;
}
model {
eta ~ normal(0, 1);
y ~ normal(theta, sigma);
}
"""
with open("schools_code.stan", "w") as f:
f.write(schools_code)
sm = cmdstanpy.CmdStanModel(stan_file="schools_code.stan")
samples = sm.sample(data=schools_data, output_dir="./", show_progress=False)
samples = az.from_cmdstanpy(samples)
bebi103.stan.clean_cmdstan()
# Make a plot of samples
p = bokeh.plotting.figure(
frame_height=250, frame_width=250, x_axis_label="μ", y_axis_label="τ"
)
p.circle(
np.ravel(samples.posterior["mu"]),
np.ravel(samples.posterior["tau"]),
alpha=0.1
)
bokeh.io.show(p)
21:19:57 - cmdstanpy - INFO - compiling stan file /Users/bois/Dropbox/git/bebi103_course/2024/b/content/lessons/00/schools_code.stan to exe file /Users/bois/Dropbox/git/bebi103_course/2024/b/content/lessons/00/schools_code
21:20:10 - cmdstanpy - INFO - compiled model executable: /Users/bois/Dropbox/git/bebi103_course/2024/b/content/lessons/00/schools_code
21:20:10 - cmdstanpy - INFO - CmdStan start processing
21:20:10 - cmdstanpy - INFO - Chain [1] start processing
21:20:10 - cmdstanpy - INFO - Chain [2] start processing
21:20:10 - cmdstanpy - INFO - Chain [3] start processing
21:20:10 - cmdstanpy - INFO - Chain [4] start processing
21:20:10 - cmdstanpy - INFO - Chain [1] done processing
21:20:10 - cmdstanpy - INFO - Chain [2] done processing
21:20:11 - cmdstanpy - INFO - Chain [3] done processing
21:20:11 - cmdstanpy - INFO - Chain [4] done processing
21:20:11 - cmdstanpy - WARNING - Some chains may have failed to converge.
Chain 3 had 1 divergent transitions (0.1%)
Chain 4 had 1 divergent transitions (0.1%)
Use the "diagnose()" method on the CmdStanMCMC object to see further information.
Computing environment
[2]:
%load_ext watermark
%watermark -v -p numpy,bokeh,cmdstanpy,arviz,jupyterlab
print("CmdStan : {0:d}.{1:d}".format(*cmdstanpy.cmdstan_version()))
Python implementation: CPython
Python version : 3.11.5
IPython version : 8.15.0
numpy : 1.26.2
bokeh : 3.3.0
cmdstanpy : 1.2.0
arviz : 0.17.0
jupyterlab: 4.0.10
CmdStan : 2.33