Marginalization

I mentioned that the evidence can be computed from the likelihood and the prior. To see this, we apply the sum rule to the posterior probability. Let \(\theta_i\) be a particular possible value of a parameter or hypothesis. Then,

\[\begin{aligned} 1 = g(\theta_j\mid y) + g(\theta^c_j | y) \nonumber = g(\theta_j\mid y) + \sum_{i\ne j}g(\theta_i\mid y) = \sum_i g(\theta_i\mid y), \end{aligned}\]

Now, Bayes’s theorem gives us an expression for \(g(\theta_i\mid y)\), so we can compute the sum.

\[\begin{aligned} \sum_i g(\theta_i\mid y) = \sum_i\frac{f(y \mid \theta_i)\, g(\theta_i)}{f(y)} = \frac{1}{f(y)}\sum_i f(y \mid \theta_i)\, g(\theta_i) = 1. \end{aligned}\]

Therefore, we can compute the evidence by summing over the priors and likelihoods of all possible hypotheses or parameter values.

\[\begin{aligned} f(y) = \sum_i f(y \mid \theta_i)\, g(\theta_i). \end{aligned}\]

Using the joint probability, we also have

\[\begin{aligned} f(y) = \sum_i \pi(y, \theta_i). \end{aligned}\]

This process of eliminating a variable (in this case \(\theta_i\)) from a probability by summing is called marginalization. This will prove useful in finding the probability distribution of a single parameter among many.