This paper gives a somewhat gentle introduction to Bayesian inference: http://www.miketipping.com/papers/met-mlbayes.pdf
I got to section 2.3 without much problems but got stuck in understanding that section onwards. It starts by presenting a probabilistic regression framework where the likelihood of all data is given as:
$$ p(t|x,w,\sigma^2) = \prod_{n}p\left(t_n|x_n,w,\sigma^2\right) $$ where $t_n=y(x_n;w)+\epsilon_n$ is the 'target' value. Next, given a set of parameters $w$ and a hyperparameter $\alpha$, the prior is given as: $$ p(w|\alpha)=\prod_{m}\left(\frac{\alpha}{2\pi}\right)^{1/2}\exp\left({-\frac{\alpha}{2}w_m^2}\right) $$
I can then compute the posterior $p\left(w|t,\alpha,\sigma^2\right)$. What I don't understand is the following:
- In the first equation above, how should I interpret the product over the $N$ pairs of data $(t_n,x_n)$? Lets say I get two initial measurements from the real world, is $p\left(t|x,w,\sigma^2\right)$ supposed to give me a single real-valued probability? And how do I account for $w$ since it is not known yet?
- As far as I got it, $w$ is supposed to be a vector of size $M$ where $w_i$ contains the $i$th estimated value. Now, how can a prior for $w$ have a reference to its own vector elements if I don't know them yet? Shouldn't a prior be an independent distribution such as a Gaussian or Beta? Also, shouldn't a prior be independent of hyperparameters?
- Figure 4, on the article's page 8 has a plot from the prior and from the posteriors of an example using the $y=\sin(x)$ function with added Gaussian variance 0.2. How could I plot something similar in, say, Octave/Matlab or R?
I don't have a strong background in statistics so forgive me if this is too basic. Any help is appreciated.
Thanks in advance!