Understanding the Kolmogorov-Smirnov Test

What is Kolmogorov-Smirnov Test?

The Kolmogorov-Smirnov Test (K-S test) is a nonparametric statistical method used to compare the distributions of two sample datasets or a sample dataset against a reference probability distribution.

This test evaluates the maximum distance between:

The empirical distribution functions of two samples
The empirical distribution function of a sample and an external cumulative reference

This test is useful for identifying differences in datasets and determining whether a dataset follows a specific distribution. Its applications span fields like finance, physics, and environmental science.

Kolmogorov-Smirnov Test in R

In R, the ks.test() function in the stats package facilitates the execution of a Kolmogorov-Smirnov test. Required inputs include two datasets or a dataset and a reference cumulative distribution function. The test outputs the test statistic and p-value, guiding decisions on the null hypothesis—whether distributions are identical.

How to Run a Kolmogorov-Smirnov Test

Choose Datasets: Select datasets aligned with your hypothesis. Assessing distribution patterns, such as comparing heights across groups, is crucial.
Formulate the Hypothesis: Define the null hypothesis (distributions are identical) and the alternative hypothesis (distributions differ).
Calculate the Empirical Distribution Functions (EDFs): Compute EDFs by sorting datasets and calculating cumulative frequencies for each value.
Find the Maximum Distance: Identify the largest vertical gap between EDFs or between an EDF and a reference CDF. This distance (D) is key in evaluating distributional differences.
Determine the Significance Level: Set a significance level (α), commonly 0.05 or 0.01, to define the threshold for statistical significance.
Compare with Critical Value or Use P-Value: Compare D with the critical value or use the p-value. If D exceeds the critical value or the p-value is below α, reject the null hypothesis, indicating significant differences.

Kolmogorov-Smirnov Test of Normality

The K-S test of normality evaluates whether a sample comes from a normally distributed population. By comparing the sample's empirical distribution function with a normal distribution's cumulative function, significant differences suggest non-normality, especially useful for small samples.