Is two-sample Kolmogorov test working correctly?

Question

I have two one-dimensional samples that I'm trying to quantifiably distinguish (or deny such distinction). I.e. the null-hypothesis is that they come from the same population (distribution?). The alternative is that they don't. So after some reading I figured that K-S test is what I need. In order to implement it, I'm following the general instructions here (as well as wiki).

I compute the distribution functions as the first link shows (number of members of each sample with smaller value than the one that we currently have on X axis). The result I get can be seen in the picture:

Then it gets a bit confusing: do I calculate the test statistic as simply the maximum "vertical" distance between these distribution functions? I.e. I build the list of all the Y differences (for every X) as my test statistic, and in the end choose the maximum of these as the end result (to be evaluated against a test)? In my case such difference would look like the second picture: So I'm having 9 as the resulting value.

The reason I'm confused is that if I'm to believe wiki, 9 (my result) is much higher than the test stat. I have n = m = 137 (both samples have 137 elements, even though they represent independent events), so the square root turns into measly 0.12, hence even at crazy 0.1% significance level my stats refute the null with flying colors. In fact, I could have boosted my significance level all the way down to Exp(-11097) - yeah, that's minus eleven thousandth power of e...

This is more than suspicious. Hence I want to make sure that I'm doing everything correct. Or maybe I am correct, but the test itself is unfit for the situation, as it is clearly too prone for type I error for my situation. Then maybe any advice for good alternatives?

Thomas Lumley · Accepted Answer · 2023-06-03 06:39:44Z

2

The K-S test is defined in terms of the maximum difference between cumulative distribution functions. The CDF is defined as $F_X(t)= P(X\leq t)$, so its range is from 0 to 1, not 0 to the sample size, as you have. Your link has iut defined correctly, as a fraction with sample size in the denominator. You need to divide your distribution functions by their respective sample sizes to get them into the interval $[0,1]$

Your difference should be 9/137, not 9.

answered Jun 3, 2023 at 6:39

Thomas Lumley

41.6k1 gold badge52 silver badges145 bronze badges

Add a comment |

Stack Exchange Network

Is two-sample Kolmogorov test working correctly?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged
hypothesis-testing
statistical-significance
kolmogorov-smirnov-test
two-sample
or ask your own question.

Hot Network Questions

Is two-sample Kolmogorov test working correctly?

1 Answer 1

Not the answer you're looking for? Browse other questions tagged hypothesis-testingstatistical-significancekolmogorov-smirnov-testtwo-sample or ask your own question.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
hypothesis-testing
statistical-significance
kolmogorov-smirnov-test
two-sample
or ask your own question.