What is Kolmogorov-Smirnov Test?
The Kolmogorov-Smirnov Test (K-S test) is a nonparametric statistical method used to compare the distributions of two sample datasets or a sample dataset against a reference probability distribution.
This test evaluates the maximum distance between:
- The empirical distribution functions of two samples
- The empirical distribution function of a sample and an external cumulative reference
This test is useful for identifying differences in datasets and determining whether a dataset follows a specific distribution. Its applications span fields like finance, physics, and environmental science.
Kolmogorov-Smirnov Test in R
In R, the ks.test() function in the stats package facilitates the execution of a Kolmogorov-Smirnov test. Required inputs include two datasets or a dataset and a reference cumulative distribution function. The test outputs the test statistic and p-value, guiding decisions on the null hypothesis—whether distributions are identical.
How to Run a Kolmogorov-Smirnov Test
- Choose Datasets: Select datasets aligned with your hypothesis. Assessing distribution patterns, such as comparing heights across groups, is crucial.
- Formulate the Hypothesis: Define the null hypothesis (distributions are identical) and the alternative hypothesis (distributions differ).
- Calculate the Empirical Distribution Functions (EDFs): Compute EDFs by sorting datasets and calculating cumulative frequencies for each value.
- Find the Maximum Distance: Identify the largest vertical gap between EDFs or between an EDF and a reference CDF. This distance (D) is key in evaluating distributional differences.
- Determine the Significance Level: Set a significance level (α), commonly 0.05 or 0.01, to define the threshold for statistical significance.
- Compare with Critical Value or Use P-Value: Compare D with the critical value or use the p-value. If D exceeds the critical value or the p-value is below α, reject the null hypothesis, indicating significant differences.
Kolmogorov-Smirnov Test of Normality
The K-S test of normality evaluates whether a sample comes from a normally distributed population. By comparing the sample's empirical distribution function with a normal distribution's cumulative function, significant differences suggest non-normality, especially useful for small samples.
