Bayes’s theorem as a model for learning

Let’s say we did an experiment and got data set y1 as an investigation of hypothesis θ. Then, our posterior distribution is

g(θy1)=f(y1θ)g(θ)f(y1).

Now, let’s say we did another experiment and got data y2. We already know y1 ahead of this experiment, so our prior is g(θy1), which is the posterior from the first experiment. So, we have

g(θy1,y2)=f(y2y1,θ)g(θy1)f(y2y1).

Now, we plug in Bayes’s theorem applied to our first data set, giving

g(θy1,y2)=f(y2y1,θ)f(y1θ)g(θ)f(y2y1)f(y1).

By the product rule, the denominator is f(y1,y2). Also by the product rule,

f(y2y1,θ)f(y1θ)=f(y1,y2θ).

Inserting these expressions into equation the above expression for g(θy1,y2) yields

g(θy1,y2)=f(y1,y2θ)g(θ)f(y1,y2).

So, acquiring more data gave us more information about our hypothesis in that same way as if we just combined y1 and y2 into a single data set. So, acquisition of more and more data serves to help us learn more and more about our hypothesis or parameter value.

Bayes theorem thus describes how we learn from data. We acquire data, and that updates our posterior distribution. That posterior distribution then becomes the prior distribution for interpreting the next data set we acquire, and so on. Data constantly update our knowledge.