Gaussian processes: The uncertainty is reduced close to the observations?

Question

I am currently studying the textbook Gaussian Processes for Machine Learning by Carl Edward Rasmussen and Christopher K. I. Williams. Chapter 1 Introduction says the following:

In this section we give graphical illustrations of how the second (Bayesian) method works on some simple regression and classification examples.

We first consider a simple 1-d regression problem, mapping from an input $x$ to an output $f(x)$. In Figure 1.1(a) we show a number of sample functions drawn at random from the prior distribution over functions specified by a particular Gaussian process which favours smooth functions. This prior is taken to represent our prior beliefs over the kinds of functions we expect to observe, before seeing any data. In the absence of knowledge to the contrary we have assumed that the average value over the sample functions at each $x$ is zero. Although the specific random functions drawn in Figure 1.1(a) do not have a mean of zero, the mean of $f(x)$ values for any fixed $x$ would become zero, independent of $x$ as we kept on drawing more functions. At any value of $x$ we can also characterize the variability of the sample functions by computing the variance at that point. The shaded region denotes twice the pointwise standard deviation; in this case we used a Gaussian process which specifies that the prior variance does not depend on $x$.

Suppose that we are then given a dataset $\mathcal{D} = \{(\mathbf{\mathrm{x}}_1,y_1),(\mathbf{\mathrm{x}}_2,y_2)\}$ consisting of two observations, and we wish now to only consider functions that pass though these two data points exactly. (It is also possible to give higher preference to functions that merely pass “close” to the datapoints.) This situation is illustrated in Figure 1.1(b). The dashed lines show sample functions which are consistent with $\mathcal{D}$, and the solid line depicts the mean value of such functions. Notice how the uncertainty is reduced close to the observations. The combination of the prior and the data leads to the posterior distribution over functions.

If more datapoints were added one would see the mean function adjust itself to pass through these points, and that the posterior uncertainty would reduce close to the observations. ...

I am confused by this part:

Notice how the uncertainty is reduced close to the observations.

What does it mean by "close to the observations"? I can see that the shaded region – twice the pointwise standard deviation – is minimal at points where the mean of the sample functions has the same value as the sample functions, but it isn't totally clear to me what the authors are referring to.

They mean close to $x_1$ or $x_2$, don’t they? The uncertainty in predicted $f(x_1 + e)$ is smaller for e closer to 0. How much smaller depends on the function variance hyperparameter if I recall correctly — Jonathan, Commented Dec 26, 2020 at 7:29
@Jonathan Hmm, I'm not sure. It sounds like they're saying that figure 1.1(b) shows that "the uncertainty is reduced close to the observations", but this isn't clear to me. — The Pointer, Commented Dec 26, 2020 at 7:33

Tim · Accepted Answer · 2020-12-26 08:09:04Z

1

A function is a map $f: X \to Y$, Gaussian Process learns to approximate the functions given the data. The example description says that you are given two points $\mathcal{D} = \{(\mathbf{\mathrm{x}}_1,y_1),(\mathbf{\mathrm{x}}_2,y_2)\}$, presumably around $0.2$ and $0.55$, what could be guessed from the second plot showing the posterior predictive distribution. The uncertainty there goes close to zero, since we know what is the relation between $x$ and $y$ for those points. If the approximation of the learned function is to be correct, it needs to go through the points, so the functions sampled from Gaussian Process (the distribution over functions), need to go though them as well. There’s no uncertainty about what would be the values of $f(x)$ for those particular points. Moreover, since the functions are continuous, the function outputs for the values close to each other need to be somehow similar, so also the uncertainty close to the known points would somehow decrease. If you are using Gaussian Process that assumes noise-free data, it could go all the way down to zero, while with noisy data, there would be some non-zero uncertainty around the datapoints.

edited Dec 26, 2020 at 8:09

answered Dec 26, 2020 at 7:59

Tim

140k26 gold badges265 silver badges507 bronze badges

$\begingroup$ I'm having difficulty understanding your answer. "The uncertainty there goes close to zero" where are you referring to? The uncertainty from $0.2$ to $0.55$ is given by the shaded region, no? So how can it be said that it goes close to zero? $\endgroup$
– The Pointer
Commented Dec 26, 2020 at 8:35
$\begingroup$ @ThePointer the shaded region, that shows the uncertainty, goes to zero width (on y-axis dimension) around those points. $\endgroup$
– Tim
Commented Dec 26, 2020 at 8:40
$\begingroup$ You're referring to the point where all the curves intersect at around $x = 0.55$? $\endgroup$
– The Pointer
Commented Dec 26, 2020 at 8:48
$\begingroup$ @ThePointer yes. $\endgroup$
– Tim
Commented Dec 26, 2020 at 8:52
$\begingroup$ Ahh, ok, I understand now. $\endgroup$
– The Pointer
Commented Dec 26, 2020 at 9:05

Add a comment |

Stack Exchange Network

Gaussian processes: The uncertainty is reduced close to the observations?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
regression
gaussian-process
posterior
prior
or ask your own question.

Hot Network Questions

Gaussian processes: The uncertainty is reduced close to the observations?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged regressiongaussian-processposteriorprior or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
regression
gaussian-process
posterior
prior
or ask your own question.