Derivation of k nearest neighbor classification rule

Ask Question

Asked 3 years, 8 months ago

Modified 3 years, 8 months ago

Viewed 442 times

One way to derive the k-NN decision rule based on the k-NN density estimation goes as follows:

given $k$ the number of neighbors, $k_i$ the number of neighbors of class $i$ in the bucket, $N$ the total number of samples and $N_i$ the number of samples of class $i$

We start from

$$p(x) = \frac{k}{N V(x)} $$

With $V(x)$ the volume of a d-dimensional ball with radius being the distance to the k-closest sample. We can compute

$$p(x , C_i) = \frac{k_i}{N V(x)} $$

and then we can simply compute the posterior as

$$p(C_i | x) = \frac{p(x , C_i)}{p(x)} = \frac{k_i}{k} $$

given raise to the usual k-NN classifier decision rule.

But other way to approach the problem is to start from computing separately each class conditional distribution by means of the k-NN density estimation technique:

Starting from $$ p(x|C_i) = \frac{k}{N_i V_i(x)}$$ With $V_i(x)$ the volume of a d-dimensional ball with radius being the distance to the k-closest sample of class $i$.

Using

$$ p(C_i) = \frac{N_i}{N}$$

We can simply compute $$p(x , C_i) = \frac{k}{N V_i(x)} $$

and therefore

$$p(x) = \sum_i \frac{k}{N V_i(x)} $$

giving

$$ p(C_i|x) = \frac{p(x , C_i)}{p(x)} = \frac{\sum_{j \neq i}^C V_j(x)}{\sum_j^C \prod_{l, l \neq j}^C V_l(x)}$$

With $C$ the number of classes

For the usual binary scenario, this formula evaluates to

$$ p(C_1|x) = \frac{V_0(x)}{V_1(x) + V_0(x)} $$

meaning that we choose the class whose distance to the k-furthest point is smaller

Both derivations started from the density estimation approach of k-NN, but ended up in related but different decision rules. I haven't seen the later used, while the former is widely adopted in the machine learning community. I personally even believe that the second derivation is less hacky

So the question is: Is there something wrong in the derivation of the second approach to k-NN ? Is there any reason to prefer the first approach over the second one besides beign easier to compute for multiclass settings ?

edited Oct 31, 2020 at 2:56

asked Oct 27, 2020 at 4:10

diegobatt

4264 silver badges7 bronze badges

Add a comment |

Stack Exchange Network

Derivation of k nearest neighbor classification rule

0

Browse other questions tagged
machine-learning
classification
nonparametric
k-nearest-neighbour
density-estimation
or ask your own question.

Hot Network Questions

Derivation of k nearest neighbor classification rule

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Browse other questions tagged machine-learningclassificationnonparametrick-nearest-neighbourdensity-estimation or ask your own question.

Related

Hot Network Questions

Browse other questions tagged
machine-learning
classification
nonparametric
k-nearest-neighbour
density-estimation
or ask your own question.