Marginalization =============== I mentioned that the evidence can be computed from the likelihood and the prior. To see this, we apply the sum rule to the posterior probability. Let :math:`\theta_i` be a particular possible value of a parameter or hypothesis. Then, .. math:: \begin{aligned} 1 = g(\theta_j\mid y) + g(\theta^c_j | y) \nonumber = g(\theta_j\mid y) + \sum_{i\ne j}g(\theta_i\mid y) = \sum_i g(\theta_i\mid y), \end{aligned} Now, Bayes’s theorem gives us an expression for :math:`g(\theta_i\mid y)`, so we can compute the sum. .. math:: \begin{aligned} \sum_i g(\theta_i\mid y) = \sum_i\frac{f(y \mid \theta_i)\, g(\theta_i)}{f(y)} = \frac{1}{f(y)}\sum_i f(y \mid \theta_i)\, g(\theta_i) = 1. \end{aligned} Therefore, we can compute the evidence by summing over the priors and likelihoods of all possible hypotheses or parameter values. .. math:: \begin{aligned} f(y) = \sum_i f(y \mid \theta_i)\, g(\theta_i). \end{aligned} Using the joint probability, we also have .. math:: \begin{aligned} f(y) = \sum_i \pi(y, \theta_i). \end{aligned} This process of eliminating a variable (in this case :math:`\theta_i`) from a probability by summing is called **marginalization**. This will prove useful in finding the probability distribution of a single parameter among many.