一、峰度kurtosis
1. 随机变量的峰度定义(Pearson’s moment coefficient of kurtosis)
随机变量 X {X} X的峰度kurtosis为四阶标准矩,标准定义为:
K u r t [ X ] = E [ ( X − μ σ ) 4 ] = μ 4 σ 4 = E [ ( X − μ ) 4 ] ( E [ ( X − μ ) 2 ] ) 2 , Kurt[X]=\displaystyle E \Big[(\frac{X-\mu}{\sigma})^4\Big]=\frac{\mu_4}{\sigma^4}=\frac{E\Big[(X-\mu)^4\Big]}{\Big(E\Big[(X-\mu)^2\Big]\Big)^2}, Kurt[X]=E[(σX−μ)4]=σ4μ4=(E[(X−μ)2])2E[(X−μ)4],
其中, μ 4 \mu_4 μ4为随机变量 X {X} X的四阶中心距, σ \sigma σ为随机变量 X {X} X的标准差, E E E是求期望。
2. 样本峰度的定义
具有n( n ≥ 3 n\geq 3 n≥3)个样本的峰度定义为:
g 2 = m 4 m 2 2 − 3 = 1 n Σ i = 1 n ( x i − x ˉ ) 4 [ 1 n Σ i = 1 n ( x i − x ˉ ) 2 ] 2 − 3 \displaystyle g_2=\frac{m_4}{m_2^2} – 3=\frac{\frac{1}{n}\Sigma_{i=1}^{n}(x_i-{\bar x})^4}{\Big[\frac{1}{n}\Sigma_{i=1}^{n}(x_i-{\bar x})^2\Big]^2} – 3 g2=m22m4−3=[n1Σi=1n(xi−xˉ)2]2n1Σi=1n(xi−xˉ)4−3
其中, x ˉ \bar x xˉ为样本的均值, m 2 m_2 m2为关于均值二阶样本矩(即二阶样本中心矩,或样本方差), m 4 m_4 m4为关于均值的四阶样本矩(即四阶样本中心矩)。
3. 总体峰度的估计
实际上,在许多文献中,尤其对于总体的样本子集来说,样本峰度是关于总体峰度的一个无偏估计量;一个常用的总体峰度的估计量计算公式为:
G 2 = k 4 k 2 2 = n 2 [ ( n + 1 ) m 4 − 3 ( n − 1 ) m 2 2 ] ( n − 1 ) ( n − 2 ) ( n − 3 ) ( n − 1 ) 2 n 2 m 2 2 = n − 1 ( n − 2 ) ( n − 3 ) [ ( n + 1 ) m 4 m 2 2 − 3 ( n − 1 ) ] = n − 1 ( n − 2 ) ( n − 3 ) [ ( n + 1 ) g 2 + 6 ] / / 样 本 峰 度 的 无 偏 估 计 量 = ( n + 1 ) n ( n − 1 ) ( n − 2 ) ( n − 3 ) ∑ i = 1 n ( x i − x ˉ ) 4 [ ∑ i = 1 n ( x i − x ˉ ) 2 ] 2 − 3 ( n − 1 ) 2 ( n − 2 ) ( n − 3 ) = ( n + 1 ) n ( n − 1 ) ( n − 2 ) ( n − 3 ) ∑ i = 1 n ( x i − x ˉ ) 4 k 2 2 − 3 ( n − 1 ) 2 ( n − 2 ) ( n − 3 ) {\displaystyle {\begin{aligned}G_{2}&={\frac {k_4}{k_2^{2}}} \\[18pt]&={\frac {n^{2}\,[(n+1)\,m_{4}-3\,(n-1)\,m_{2}^{2}]}{(n-1)\,(n-2)\,(n-3)}}\;{\frac {(n-1)^{2}}{n^{2}\,m_{2}^{2}}} \\[18pt]&={\frac {n-1}{(n-2)\,(n-3)}}\left[(n+1)\,{\frac {m_{4}}{m_{2}^{2}}}-3\,(n-1)\right] \\[18pt]&={\frac {n-1}{(n-2)\,(n-3)}}\left[(n+1)\,g_{2}+6\right]//样本峰度的无偏估计量 \\[18pt]&={\frac {(n+1)\,n\,(n-1)}{(n-2)\,(n-3)}}\;{\frac {\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{4}}{\left[\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{2}\right]^{2}}}-3\,{\frac {(n-1)^{2}}{(n-2)\,(n-3)}} \\[18pt]&={\frac {(n+1)\,n}{(n-1)\,(n-2)\,(n-3)}}\;{\frac {\sum _{i=1}^{n}(x_{i}-{\bar {x}})^{4}}{k_{2}^{2}}}-3\,{\frac {(n-1)^{2}}{(n-2)(n-3)}}\end{aligned}}} G2=k22k4=(n−1)(n−2)(n−3)n2[(n+1)m4−3(n−1)m22]n2m22(n−1)2=(n−2)(n−3)n−1[(n+1)m22m4−3(n−1)]=(n−2)(n−3)n−1[(n+1)g2+6]//样本峰度的无偏估计量=(n−2)(n−3)(n+1)n(n−1)[∑i=1n(xi−xˉ)2]2∑i=1n(xi−xˉ)4−3(n−2)(n−3)(n−1)2=(n−1)(n−2)(n−3)(n+1)nk22∑i=1n(xi−xˉ)4−3(n−2)(n−3)(n−1)2
其中, κ 4 \kappa_4 κ4为四阶累积量的唯一对称无偏估计量, κ 2 \kappa_2 κ2为二阶累积量的对称无偏估计量(即样本方差的无偏估计量), m 4 m_4 m4为四阶样本中心矩, m 2 m_2 m2为二阶样本中心矩, x ˉ \bar x xˉ为样本均值。
通常来说,峰度 G 2 G_2 G2都是有偏估计量,只有正态分布是无偏的。
大多数软件实现的峰度计算公式包括Python的Pandas库都是采用 G 2 G_2 G2的计算公式实现的。
Pandas 源码片段
def nankurt(values, axis=None, skipna=True, mask=None): """ Compute the sample excess kurtosis The statistic computed here is the adjusted Fisher-Pearson standardized moment coefficient G2, computed directly from the second and fourth central moment. """ ...... mean = values.sum(axis, dtype=np.float64) / count if axis is not None: mean = np.expand_dims(mean, axis) adjusted = values - mean if skipna: np.putmask(adjusted, mask, 0) adjusted2 = adjusted 2 adjusted4 = adjusted2 2 m2 = adjusted2.sum(axis, dtype=np.float64) m4 = adjusted4.sum(axis, dtype=np.float64) with np.errstate(invalid='ignore', divide='ignore'): adj = 3 * (count - 1) 2 / ((count - 2) * (count - 3)) numer = count * (count + 1) * (count - 1) * m4 denom = (count - 2) * (count - 3) * m2 2 with np.errstate(invalid='ignore', divide='ignore'): result = numer / denom - adj ...... return result
参考资料
- Skewness – WikiPedia
- Joanes D N, Gill C A. Comparing measures of sample skewness and kurtosis[J]. Journal of the Royal Statistical Society: Series D (The Statistician), 1998, 47(1): 183-189.
- binti Yusoff S, Wah Y B. Comparison of conventional measures of skewness and kurtosis for small sample size[C]//2012 International Conference on Statistics in Science, Business and Engineering (ICSSBE). IEEE, 2012: 1-6.
- Pebay P P. Formulas for robust, one-pass parallel computation of covariances and arbitrary-order statistical moments[R]. Sandia National Laboratories, 2008.
- Online skewness kurtosis computing
- Online linear regression computing
- Pandas
发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/225201.html原文链接:https://javaforall.net
