Bayes’s theorem as a model for learning
Let’s say we did an experiment and got data set \(y_1\) as an investigation of hypothesis \(\theta\). Then, our posterior distribution is
Now, let’s say we did another experiment and got data \(y_2\). We already know \(y_1\) ahead of this experiment, so our prior is \(g(\theta\mid y_1)\), which is the posterior from the first experiment. So, we have
Now, we plug in Bayes’s theorem applied to our first data set, giving
By the product rule, the denominator is \(f(y_1, y_2 )\). Also by the product rule,
Inserting these expressions into equation the above expression for \(g(\theta\mid y_1, y_2)\) yields
So, acquiring more data gave us more information about our hypothesis in that same way as if we just combined \(y_1\) and \(y_2\) into a single data set. So, acquisition of more and more data serves to help us learn more and more about our hypothesis or parameter value.
Bayes theorem thus describes how we learn from data. We acquire data, and that updates our posterior distribution. That posterior distribution then becomes the prior distribution for interpreting the next data set we acquire, and so on. Data constantly update our knowledge.