4
$\begingroup$

One way to derive the k-NN decision rule based on the k-NN density estimation goes as follows:

given $k$ the number of neighbors, $k_i$ the number of neighbors of class $i$ in the bucket, $N$ the total number of samples and $N_i$ the number of samples of class $i$

We start from

$$p(x) = \frac{k}{N V(x)} $$

With $V(x)$ the volume of a d-dimensional ball with radius being the distance to the k-closest sample. We can compute

$$p(x , C_i) = \frac{k_i}{N V(x)} $$

and then we can simply compute the posterior as

$$p(C_i | x) = \frac{p(x , C_i)}{p(x)} = \frac{k_i}{k} $$

given raise to the usual k-NN classifier decision rule.

But other way to approach the problem is to start from computing separately each class conditional distribution by means of the k-NN density estimation technique:

Starting from $$ p(x|C_i) = \frac{k}{N_i V_i(x)}$$ With $V_i(x)$ the volume of a d-dimensional ball with radius being the distance to the k-closest sample of class $i$.

Using

$$ p(C_i) = \frac{N_i}{N}$$

We can simply compute $$p(x , C_i) = \frac{k}{N V_i(x)} $$

and therefore

$$p(x) = \sum_i \frac{k}{N V_i(x)} $$

giving

$$ p(C_i|x) = \frac{p(x , C_i)}{p(x)} = \frac{\sum_{j \neq i}^C V_j(x)}{\sum_j^C \prod_{l, l \neq j}^C V_l(x)}$$

With $C$ the number of classes

For the usual binary scenario, this formula evaluates to

$$ p(C_1|x) = \frac{V_0(x)}{V_1(x) + V_0(x)} $$

meaning that we choose the class whose distance to the k-furthest point is smaller

Both derivations started from the density estimation approach of k-NN, but ended up in related but different decision rules. I haven't seen the later used, while the former is widely adopted in the machine learning community. I personally even believe that the second derivation is less hacky

So the question is: Is there something wrong in the derivation of the second approach to k-NN ? Is there any reason to prefer the first approach over the second one besides beign easier to compute for multiclass settings ?

$\endgroup$

0