Homework 3.1: Least squares (20 pts)

You may have heard that to estimate the best-fit parameters for a variate-covariate model, you should “minimize the sum of the squares of the residuals.” In this problem, we will parse what that means and understand it in the context of generative Bayesian modeling.

Say I have some (x, y) data, where x is the independent variable (known essentially exactly; this could be something like time) and y is the dependent variable (which has some stochasticity of noise in its measurement). Imagine that we have derived a theoretical relation between our expected y and x, and that relation is written as \(y = f_y(x;\phi)\), where \(\phi\) is some set of parameters that we wish to determine from our inference.

Our data set is \(\{(x_1, y_1), (x_2, y_2), \ldots, (x_N, y_N)\}\). We assume the measurements of \(y\) are i.i.d. and Normally distributed.

\begin{align} y_i \sim \mathrm{Norm}(f_y(x_i;\phi), \sigma_i) \;\;\forall i. \end{align}

A residual for a data point is defined as

\begin{align} r_i = \frac{y_i - f_y(x_i;\phi)}{\sigma_i}. \end{align}

The sum of the squares of the residuals is then

\begin{align} \sum_{i=1}^N r_i^2 = \sum_{i=1}^N\,\frac{(y_i - f_y(x_i;\phi))^2}{\sigma_i^2}. \end{align}

a) Show that finding the parameters \(\phi\) that minimize the sum of the squares of the residuals is equivalent to finding the MAP parameters for a model with the above likelihood and Uniform priors on all parameters. Note that the values of \(\sigma_i\) are taken to be known in this context; they are not parameters to be estimated.

b) Now assume that we have homoscedastic errors, that is that \(\sigma_i = \sigma\) for all \(i\). Assume further that now \(\sigma\) is not known. Show that if the prior is such that \(\sigma\) is independent of \(\phi\) (whose prior we still assume to be constant), we do not need to consider the parameter \(\sigma\) at all in finding the MAP values for \(\phi\). (The assumption of homoscedasticity is often made when the \(\sigma_i\) are not known. This is often the procedure that is done with least squares.)

c) Discuss any issues you see with taking this approach. Think generatively.