KNN as a crude prototype of Gaussian Process Regression?

Question

I've heard it said before that K-Means-Clustering is a prototypical method for Expectation-Maximization algorithm. Where KM Clustering returns a hard cluster assignment, EM returns soft assignments, namely the probability of each assignment.

I put my imagination to work over the weekend extended this analogy to K-Nearest-Neighbor and Gaussian Process Regression. Both models use a linear combination of training points as the prediction of the output. KNN makes hard assignments through the average of the K Neighbors in consideration, whereas GP regression makes soft assignments by sampling a multivariate gaussian given the mean vector and covariance vector. In KNN, the prediction is a weighted average of inputs, whereas in GP Regression, we don't dictate the particular linear combination of training data points explicitly. Rather, by conditioning on training data, we constrain the model to sample functions that fit the training data reasonably well.

This is probably the limit of the models' similarities. But if I'm on the right track, this will have given me a good mental schema to reason through GP regression as it's quite an abstract concept.

passerby51 · Accepted Answer · 2021-01-19 17:56:44Z

There is some resemblance. I am going to use some of what I wrote in my answer to the previous post.

Using the notation there, we have \begin{align*} f(x^*) \mid y &\sim N\Big(K(x^*,x)\big(\sigma^2 I + K(x,x)\big)^{-1} y, \\ &\qquad K(x^*,x^*) - K(x^*,x) \big(\sigma^2 I + K(x,x)\big)^{-1} K(x,x^*)\Big). \end{align*} Let us forget about the covariance and only look at the posterior mean, i.e., $$ \hat f(x^*) := \mathbb E [f(x^*) \mid y] = K(x^*,x)\big(\sigma^2 I + K(x,x)\big)^{-1} y. $$ Recall that $x = (x_1,\dots,x_n)$ contains the training data. Let me write $K_{tr} = K(x,x)$ for the kernel matrix obtained from the training data. Also, let $x^*$ be a single point that we are interested in. Furthermore, let us rename $x^*$ to $t$ for simplicity. Our function estimate is \begin{align*} \hat f(t) = K(t, x) \big(\sigma^2 I + K_{tr}\big)^{-1}y \end{align*} that is $$ \hat f(t) = \sum_{i=1}^n \alpha_i(y) K(t, x_i), \quad \alpha_i(y) := [\big(\sigma^2 I + K_{tr}\big)^{-1} y]_i $$ and $[\cdot]_i$ denotes the $i$th coordinate of a vector. That is, the function estimate for GP regression is a linear combination of the kernel functions anchored at training points $x_i$ with weights $\alpha_i(y)$. Note the resemblance with the solution of the kernel-ridge regression (to be found elsewhere!).

The $k$-NN function estimate is $$ \hat g(t) = \sum_{j} \beta_j(y)\,1_{N_j}(t) $$ where $N_j$ is one of the cells in the $k$th order Voronoi partition of the space by the training data $\{x_i\}$ and $1_{N_j}(t) = 1\{t \in N_j\}$ is the indicator function of set $N_j$. In other words, $N_j$ is such that all the points in it have the same $k$ nearest neighbors in the training data $\{x_i\}$. Here, $ \beta_j(y) = \sum_i y_i 1\{x_i \in N_j\} $ is the average of the training-data response over cell $N_j$.

Both estimator are linear in $y$, since both $\alpha_i(y)$ and $\beta_j(y)$ are linear functions of $y$. The GP regression estimate is a weighted combination of kernel functions anchored at training features. The $k$NN estimate is a piecewise constant function over the cells defined by the training features. You are replacing the hard indicators $1_{N_j}(t)$ with soft kernels $K(t, x_j)$ when going from $k$NN to GP regression.

In my opinion, it is much easier to understand GP regression than $k$-nearest neighbor regression (due to the complexity of the Voronoi partition.)

This is a good write up. I think the most illustrative example is the inference point "far away" from the train data (aka extrapolation). A naive/vanilla embedding with GPR can not replicate the extrapolation behaviour of KNN afaik. Even if you try to use some Voronoi embedding I don't think you can do it but not 100% sure. — safetyduck, Commented Feb 23, 2023 at 12:06

Stack Exchange Network

KNN as a crude prototype of Gaussian Process Regression?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
regression
bayesian
nonparametric
gaussian-process
k-nearest-neighbour
or ask your own question.

Linked

Hot Network Questions

KNN as a crude prototype of Gaussian Process Regression?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged regressionbayesiannonparametricgaussian-processk-nearest-neighbour or ask your own question.

Linked

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
regression
bayesian
nonparametric
gaussian-process
k-nearest-neighbour
or ask your own question.