柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）

摘要：

Kolmogorov通过有效地提供其收敛速度加强了这一结果。Kolmogorovdistribution预备知识：独立增量过程顾名思义，就是指其增量是相互独立的。Kolmogorovdistribution柯尔莫戈罗夫分布是随机变量K的分布:即是通过求布朗运动上确界得到的随机变量的分布。它的累积分布函数可以写为：whichcanalsobeexpressedbytheJacobithetafunction.BoththeformoftheKolmogorov–SmirnovteststatisticanditsasymptoticdistributionunderthenullhypothesiswerepublishedbyAndreyKolmogorov,[3]whileatableofthedistributionwaspublishedbyNikolaiSmirnov.[4]Recurrencerelationsforthedistributionoftheteststatisticinfinitesamplesareavailable.[3]单样本KolmogorovGoodness-of-FitTest单样本K-S检验即是检验样本数据点是否满足某种理论分布。通过修正提高精度：However,averysimpleexpedientofreplacingbyintheargumentoftheJacobithetafunctionreducestheseerrorsto,,andrespectively;suchaccuracywouldbeusuallyconsideredmorethanadequateforallpracticalapplications.[5]拟合优度检验或柯尔莫戈罗夫-斯米尔诺夫检验可以用柯尔莫戈罗夫分布的临界值来构造。

K-S检验方法能够利用样本数据推断样本来自的总体是否服从某一理论分布，是一种拟合优度的检验方法，适用于探索连续型随机变量的分布。

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第1张

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第2张

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第3张

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第4张

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第5张

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第6张

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第7张

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第8张

Kolmogorov–Smirnov test

Kolmogorov–Smirnov statistic

累计分布函数：

定义n个独立同分布（i.i.d.）有序观测样本Xi的经验分布函数Fn为：

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第9张

$I_{[- inf, x]}$

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第10张

样本集Xi的累计分布函数 $F_{n} (x)$

I_{[- inf, x]} (X_{i}) = {\begin{matrix} 1, X_{i} \leq x; \\ 0, X_{i} > x; \end{matrix}

s u p_{x}

在实践中，统计量需要相对大量的数据点(与Anderson–Darling teststatistic等其他拟合优度标准相比)才能恰当地拒绝零假设。

Kolmogorov distribution

预备知识：

(1) 独立增量过程

顾名思义，就是指其增量是相互独立的。严格定义如下：

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第12张

(2) 维纳过程（Wiener process）

大概可以理解为一种数学化的布朗运动，严格定义如下：

(3)布朗桥（Brownian bridge）

一种特殊的维纳过程，严格定义如下：

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第14张

一个在 $[0, T]$ $[0, T]$ $[0, T]$ $[0, T]$

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第15张

红色和绿色的都是“布朗桥”。

Kolmogorov distribution

柯尔莫戈罗夫分布是随机变量K的分布:

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第16张

即是通过求布朗运动上确界得到的随机变量的分布。其中B(t)为布朗桥。

它的累积分布函数可以写为：

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第17张

which can also be expressed by theJacobi theta function ${displaystyle vartheta _{01}(z=0; au =2ix^{2}/pi )}$ . Both the form of the Kolmogorov–Smirnov test statistic and its asymptotic distribution under the null hypothesis were published byAndrey Kolmogorov,^[3]while a table of the distribution was published byNikolai Smirnov.^[4]Recurrence relations for the distribution of the test statistic in finite samples are available.^[3]

单样本Kolmogorov Goodness-of-Fit Test

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第19张

单样本K-S检验即是检验样本数据点是否满足某种理论分布。

我们从零假设H0出发（在样本来自假设分布F(x)的零假设下），此时，若理论分布是一种连续分布（这里仅考虑连续分布的情况），则有：

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第20张

也就是说在样本点趋于无限多时，柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第21张 $\sqrt{n} D_{n}$

$\sqrt{n} D_{n}$

通过修正提高精度：

However, a very simple expedient of replacing $x$ by

${displaystyle x+{frac {1}{6{sqrt {n}}}}+{frac {x-1}{4n}}}$

in the argument of the Jacobi theta function reduces these errors to ${displaystyle 0.003\%}$ , ${displaystyle 0.027\%}$ , and ${displaystyle 0.27\%}$ respectively; such accuracy would be usually considered more than adequate for all practical applications.^[5]

拟合优度检验（goodness-of-fittest）或柯尔莫戈罗夫-斯米尔诺夫检验（Kolmogorov–Smirnov test）可以用柯尔莫戈罗夫分布的临界值来构造。

当 ${displaystyle n o infty }$ ，这个检验是渐近有效的。

在水平 $alpha$ 下，若满足柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第35张则拒绝零假设。其中， $K_{α}$

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第36张

该检验的渐进统计功效（statistical power）为1。

Test with estimated parameters

如果从数据Xi中确定F(x)的形式或参数，则以这种方式确定的临界值无效！（来自wiki百科）

在这种情况下，可能需要蒙特卡罗方法或其他方法，但已为某些情况编制了表格。

查阅资料[3]可以看到，Kolmogorov测试仅用于假设分布函数完全指定的情况，也即，假设分布函数中不含有需要从样本中估出的参数。否则，该测试结果将变得保守。

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第37张

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第38张

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第39张

Details for the required modifications to the test statistic and for the critical values for thenormal distributionand theexponential distributionhave been published,^[10]and later publications also include theGumbel distribution.^[11]TheLilliefors testrepresents a special case of this for the normal distribution. The logarithm transformation may help to overcome cases where the Kolmogorov test data does not seem to fit the assumption that it came from the normal distribution.

Using estimated parameters, the questions arises which estimation method should be used. Usually this would be the maximum likelihood method, but e.g. for the normal distribution MLE has a large bias error on sigma! Using a moment fit or KS minimization instead has a large impact on the critical values, and also some impact on test power. If we need to decide for Student-T data with df=2 via KS test whether the data could be normal or not, then a ML estimate based on H₀(data is normal, so using the standard deviation for scale) would give much larger KS distance, than a fit with minimum KS. In this case we should reject H₀, which is often the case with MLE, because the sample standard deviation might be very large for T-2 data, but with KS minimization we may get still a too low KS to rejectH₀. In the Student-T case, a modified KS test with KS estimate instead of MLE, makes the KS test indeed slightly worse. However, in other cases, such a modified KS test leads to slightly better test power.

Discrete and mixed null distribution

Two-sample Kolmogorov–Smirnov test（The Smirnov Test）

Two samples. Are they coming from the same population with a specific(underlying) distribution? or the two datasets differ significantly? 两个样本集是否来自同一分布，或二者是否存在显著差异？

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第41张

Kolmogorov-Smirnov检验也可以用来检验两个潜在的一维概率分布是否不同。

Smirnov统计量是：

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第42张

where $F_{1,n}$ and ${displaystyle F_{2,m}}$ are theempirical distribution functionsof the first and the second sample respectively, and $sup$ is thesupremum function.

对于大样本, 零假设在水平 $alpha$ 上被拒绝，如果：

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第47张

其中 n和m分别为第一和第二样本集的大小。对于最常见的alpha级别，下表给出了 c(alpha)的值:

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第48张

一般可取：

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第49张

注意，双样本测试检查两个数据样本是否来自相同的分布。这并没有指定这个常见的分布是什么(例如，它是正态分布)。同样，已经发布了临界值表。

Kolmogorov-Smirnov检验的一个缺点是它不是很强大，因为它被设计成对两个分布函数之间所有可能的类型的差异都很敏感。[19]和[20]表明，Cucconi检验（最初提出用于同时比较位置和尺度），在比较两个分布函数时，比Kolmogorov-Smirnov检验要强大得多。

A shortcoming of the Kolmogorov–Smirnov test is that it is not very powerful because it is devised to be sensitive against all possible types of differences between two distribution functions.^[19]and^[20]showed evidence that theCucconi test, originally proposed for simultaneously comparing location and scale, is much more powerful than the Kolmogorov–Smirnov test when comparing two distribution functions.

The Kolmogorov–Smirnov statistic in more than one dimension

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）第50张

参考：

[1]https://blog.csdn.net/qq_41679006/article/details/80977113

[2]https://en.wikipedia.org/wiki/Kolmogorov%E2%80%93Smirnov_test

[3]Conover, W. J., & Conover, W. J. (1980). Practical nonparametric statistics.

柯尔莫可洛夫-斯米洛夫检验（Kolmogorov–Smirnov test，K-S test）

Kolmogorov–Smirnov statistic

累计分布函数：

Kolmogorov distribution

(1) 独立增量过程

(2) 维纳过程（Wiener process）

(3)布朗桥（Brownian bridge）

Kolmogorov distribution

单样本Kolmogorov Goodness-of-Fit Test

Test with estimated parameters

Discrete and mixed null distribution

Two-sample Kolmogorov–Smirnov test（The Smirnov Test）

The Kolmogorov–Smirnov statistic in more than one dimension

参考：

相关文章

Git分支学习简记

pytest文档69-Hook函数之参数化生成测试用例pytest_generate_tests

Groovy动态添加方法和属性及Spock单测

android中的资源访问

perl语言入门学习笔记

k8s configmap 挂载配置文件

最新文章

随机推荐

思享工具箱导航

JSON工具

格式化转换

加解密编码

文本数字

网络

站长

计算

其他

对照列表