# The Kolmogorov-Smirnov Test: an Intuition

The Kolmogorov–Smirnov test (K–S test) tests if two probability distributions are equal. Therefore, you can compare an empirically observed distribution with a known reference distribution, or you can compare two observed distributions, to test whether they match.

It works in really quite a simple manner. Let the cumulative distribution functions of the two distributions be CDFA and CDFB respectively. We simply measure the maximum difference between these two functions for any given argument. This maximum difference is known as the Kolmogorov-Smirnov statistic, D, and is given by: $D = \max_x{(| CDF_A(x) - CDF_B(x) |)}$

You can think about it this way: if you plotted of CDFA and CDFB together on the same set of axes, D is the length of the largest vertical line you could draw between the two plots. This image illustrates the CDFs of two empirically observed distributions. The K-S test statistic is the maximum vertical distance between these two CDFs, and is represented by the black line.

To perform the Kolmogorov-Smirnov test, one simply compares D to a table of thresholds for statistical significance. The thresholds are calculated under the null hypothesis that the distributions are equal. If D is too big, the null hypothesis is rejected. The threshold for significance depends on the size of your sample (as your sample gets smaller, your D needs to get larger to show that the two distributions are different) and, of course, on the desired significance level.

The test is non-parametric or distribution-free, which means it makes no assumptions about the underlying distributions of the data. It is useful for one-dimensional distributions, but does not generalise easily to multivariate distributions.