Skew and Kurtosis (峰度和偏度) 转载

Skew and Kurtosis (峰度和偏度) 转载SkewnessItis Itmeasuresth Itdifferenti

Skewness

It is the degree of distortion from the symmetrical bell curve or the normal distribution. It measures the lack of symmetry in data distribution.
It differentiates extreme values in one versus the other tail. A symmetrical distribution will have a skewness of 0.

So, when is the skewness too much?

The rule of thumb seems to be:

  • If the skewness is between -0.5 and 0.5, the data are fairly symmetrical.
  • If the skewness is between -1 and -0.5(negatively skewed) or between 0.5 and 1(positively skewed), the data are moderately skewed.
  • If the skewness is less than -1(negatively skewed) or greater than 1(positively skewed), the data are highly skewed.

Example

Let us take a very common example of house prices. Suppose we have house values ranging from $100k to $1,000,000 with the average being $500,000.

  • If the peak of the distribution was left off the average value, portraying a positive skewness in the distribution. It would mean that many houses were being sold for less than the average value, i.e. $500k. This could be for many reasons, but we are not going to interpret those reasons here.
  • If the peak of the distributed data was right of the average value, that would mean a negative skew. This would mean that the houses were being sold for more than the average value.

Kurtosis

Kurtosis is all about the tails of the distribution — not the peakedness or flatness. It is used to describe the extreme values in one versus the other tail.It is actually the measure of outliers present in the distribution.

High kurtosis in a data set is an indicator that data has heavy tails or outliers. If there is a high kurtosis, then, we need to investigate why do we have so many outliers. It indicates a lot of things, maybe wrong data entry or other things. Investigate!

Low kurtosis in a data set is an indicator that data has light tails or lack of outliers. If we get low kurtosis(too good to be true), then also we need to investigate and trim the dataset of unwanted results.
在这里插入图片描述
Mesokurtic: This distribution has kurtosis statistic similar to that of the normal distribution. It means that the extreme values of the distribution are similar to that of a normal distribution characteristic. This definition is used so that the standard normal distribution has a kurtosis of three.

Leptokurtic (Kurtosis > 3): Distribution is longer, tails are fatter. Peak is higher and sharper than Mesokurtic, which means that data are heavy-tailed or profusion of outliers.

Outliers stretch the horizontal axis of the histogram graph, which makes the bulk of the data appear in a narrow (“skinny”) vertical range, thereby giving the “skinniness” of a leptokurtic distribution.

Platykurtic: (Kurtosis < 3): Distribution is shorter, tails are thinner than the normal distribution. The peak is lower and broader than Mesokurtic, which means that data are light-tailed or lack of outliers.
The reason for this is because the extreme values are less than that of the normal distribution.

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/230005.html原文链接:https://javaforall.net

(0)
上一篇 2026年3月16日 下午3:21
下一篇 2026年3月16日 下午3:21


相关推荐

  • 浅谈异步IO

    浅谈异步IO我们已经知道 CPU 的速度远远快于磁盘 网络等 IO 在一个线程中 CPU 执行代码的速度极快 然而 一旦遇到 IO 操作 如读写文件 发送网络数据时 就需要等待 IO 操作完成 才能继续进行下一步操作 这种情况称为同步 IO 在 IO 操作的过程中 当前线程被挂起 而其他需要 CPU 执行的代码就无法被当前线程执行了 因为一个 IO 操作就阻塞了当前线程 导致其他代码无法执行 所以我们必须使用多线程或者多

    2026年3月16日
    2
  • oracle中number的用法,Oracle Number数字

    oracle中number的用法,Oracle Number数字oracle函数的OracleNumber数字在本教程中,您将学习OracleNUMBER数据类型以及如何使用它来为表定义数字列。OracleNUMBER数据类型简介OracleNUMBER数据类型用于存储可能为负值或正值的数值。以下说明了NUMBER数据类型的语法:NUMBER[(precision[,scale])]OracleNUMBER数据类型具有以下精度和尺度。精度是一…

    2022年7月24日
    14
  • 小白专属:弃养小龙虾OpenClaw 彻底卸载 + 验证教程

    小白专属:弃养小龙虾OpenClaw 彻底卸载 + 验证教程

    2026年3月14日
    1
  • js正则使用变量_JavaScript正则

    js正则使用变量_JavaScript正则varcookieName=”admin”; varcookie=document.cookie; varpat=newRegExp(“^”+cookieName+”=\\w*”,”g”);//输出的正则表达式/^admin=\w*/g

    2025年11月22日
    8
  • 超详细 LaTex数学公式

    超详细 LaTex数学公式LaTex 表达式是一种简单的 常见的一种数学公式表达形式 在很多地方都有出现 相信正在看博客的你会深有体会 LaTex 表达式不难 甚至说很简单 但是对于没有没有接触过得小伙伴来说 会非常费脑 复杂的表达式到底该如何书写呢 LaTex 表达式一般分为两类

    2026年3月19日
    2
  • clientHeight和offsetHeight

    clientHeight和offsetHeightclientHeight 包括 padding 但不包括 border 水平滚动条 margin 的元素的高度 对于 inline 的元素这个属性一直是 0 单位 px 只读元素 offsetHeight 包括 padding border 水平滚动条 但不包括 margin 的元素的高度 对于 inline 的元素这个属性一直是 0 单位 px 只读元素 style height 返回元素的高度 包括元素高度 不包括内边距 边框和外边距 clientHeight 返回元素的高度 包括元素高度

    2026年3月18日
    2

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号