Configuring your machine

The instructions below discuss how to set up your machine. After setting it up, make sure you are always operating in the bebi103 environment.

Installing Python packages

If you did not take BE/Bi 103 a last term, complete Lesson 0 from BE/Bi 103 a, starting with the Installation on your own machine section.

Stan installation

We will be using Stan for much of our statistical modeling. Stan has a probabilistic programming language. Programs written in this language, called Stan programs, are translated into C++ by the Stan parser, and then the C++ code is compiled. As you will see throughout the class, there are many advantages to this approach.

There are many interfaces for Stan, including the two most widely used RStan and PyStan, which are R and Python interfaces, respectively. We will use a simpler interface, CmdStanPy, which has several advantages that will become apparent when you start using it.

Whichever interface you use needs to have Stan installed and functional, which means you have to have an installed C++ toolchain. Installation and compilation can be tricky and varies from operating system to operating system. The instructions below are not guaranteed to work; you may have to do some troubleshooting on your own. Note that you can use Google Colab (or other cloud computing resources) for computing as well, so you do not need to worry if you have trouble installing Stan locally.

Configuring a C++ toolchain for MacOS

On MacOS, you an install Xcode command line tools by running the following on the command line.

xcode-select --install

Configuring a C++ toolchain for Windows

According to theCmdStanPy documentation, you can skip this step, though I did previously verify that the below worked on a Windows machine.

You need to install a C++ toolchain for Windows. One possibility is to install a MinGW toolchain, and one way to do that is using conda.

conda install libpython m2w64-toolchain -c msys2

When you do this, make sure you are in the bebi103 environment.

Configuring a C++ toolchain for Linux

If you are using Linux, we assume you already have the C++ utilities installed.

Installing Stan with CmdStanPy

If you have a functioning C++ toolchain, you can use CmdStanPy to install Stan/CmdStan. You can do this by running the following at a Python prompt (either Python, IPython, or in a Jupyter notebook) (again making sure you are in the bebi103 environment).

import cmdstanpy; cmdstanpy.install_cmdstan()

This may take several minutes to run. (I did it on my Raspberry Pi, and it took hours.)

If you are using Windows and you skipped configuration of the C++ toolchain, instead run:

import cmdstanpy; cmdstanpy.install_cmdstan(compiler=True)

Checking your Stan installation

To check your Stan installation, you can run the following code. It will take several seconds for the model to compile and then sample. In the end, you should see a scatter plot of samples. You might not appreciate it yet, but this is a nifty demonstration of Stan’s power to sample hierarchical models, which is no trivial feat. You will see some warning text, and that is expected.

[1]:

import numpy as np

import bebi103
import cmdstanpy
import arviz as az

import bokeh.plotting
import bokeh.io
bokeh.io.output_notebook()

schools_data = {
    "J": 8,
    "y": [28, 8, -3, 7, -1, 1, 18, 12],
    "sigma": [15, 10, 16, 11, 9, 11, 10, 18],
}

schools_code = """
data {
  int<lower=0> J; // number of schools
  vector[J] y; // estimated treatment effects
  vector<lower=0>[J] sigma; // s.e. of effect estimates
}

parameters {
  real mu;
  real<lower=0> tau;
  vector[J] eta;
}

transformed parameters {
  vector[J] theta = mu + tau * eta;
}

model {
  eta ~ normal(0, 1);
  y ~ normal(theta, sigma);
}
"""

with open("schools_code.stan", "w") as f:
    f.write(schools_code)

with bebi103.stan.disable_logging():
    sm = cmdstanpy.CmdStanModel(stan_file="schools_code.stan")
    samples = sm.sample(data=schools_data, output_dir="./", show_progress=False)

samples = az.from_cmdstanpy(samples)

bebi103.stan.clean_cmdstan()

# Make a plot of samples
p = bokeh.plotting.figure(
    frame_height=250, frame_width=250, x_axis_label="μ", y_axis_label="τ"
)
p.scatter(
    np.ravel(samples.posterior["mu"]),
    np.ravel(samples.posterior["tau"]),
    alpha=0.1
)

bokeh.io.show(p)

Loading BokehJS ...

Computing environment

[2]:

%load_ext watermark
%watermark -v -p numpy,bokeh,cmdstanpy,arviz,bebi103,jupyterlab
print("CmdStan : {0:d}.{1:d}".format(*cmdstanpy.cmdstan_version()))

Python implementation: CPython
Python version       : 3.12.5
IPython version      : 8.27.0

numpy     : 1.26.4
bokeh     : 3.4.1
cmdstanpy : 1.2.4
arviz     : 0.20.0
bebi103   : 0.1.25
jupyterlab: 4.2.5

CmdStan : 2.36