Python scikit-learn (metrics): difference between r2_score and explained_variance_score?

Python scikit-learn (metrics): difference between r2_score and explained_variance_score?

I noticed that that ‘r2_score’ and ‘explained_variance_score’ are both build-in sklearn.metrics methods for regression problems.

I was always under the impression that r2_score is the percent variance explained by the model. How is it different from ‘explained_variance_score’?

When would you choose one over the other?

Thanks!

 

OK, look at this example:

In [123]:
#data
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.957173447537
0.948608137045
In [124]:
#what explained_variance_score really is
1-np.cov(np.array(y_true)-np.array(y_pred))/np.cov(y_true)
Out[124]:
0.95717344753747324
In [125]:
#what r^2 really is
1-((np.array(y_true)-np.array(y_pred))**2).sum()/(4*np.array(y_true).std()**2)
Out[125]:
0.94860813704496794
In [126]:
#Notice that the mean residue is not 0
(np.array(y_true)-np.array(y_pred)).mean()
Out[126]:
-0.25
In [127]:
#if the predicted values are different, such that the mean residue IS 0:
y_pred=[2.5, 0.0, 2, 7]
(np.array(y_true)-np.array(y_pred)).mean()
Out[127]:
0.0
In [128]:
#They become the same stuff
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.982869379015
0.982869379015

 

So, when the mean residue is 0, they are the same. Which one to choose dependents on your needs, that is, is the mean residue suppose to be 0?

 

Most of the answers I found (including here) emphasize on the difference between R2 and Explained Variance Score, that is: The Mean Residue (i.e. The Mean of Error).

However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?


Refresher:

R2: is the Coefficient of Determination which measures the amount of variation explained by the (least-squares) Linear Regression.

You can look at it from a different angle for the purpose of evaluating the predicted values of y like this:

Varianceactual_y × R2actual_y = Variancepredicted_y

 

So intuitively, the more R2 is closer to 1, the more actual_y and predicted_y will have samevariance (i.e. same spread)


As previously mentioned, the main difference is the Mean of Error; and if we look at the formulas, we find that’s true:

R2 = 1 - [(Sum of Squared Residuals / n) / Variancey_actual]

Explained Variance Score = 1 - [Variance(Ypredicted - Yactual) / Variancey_actual]

 

in which:

Variance(Ypredicted - Yactual) = (Sum of Squared Residuals - Mean Error) / n 

 

So, obviously the only difference is that we are subtracting the Mean Error from the first formula! … But Why?


When we compare the R2 Score with the Explained Variance Score, we are basically checking the Mean Error; so if R2 = Explained Variance Score, that means: The Mean Error = Zero!

The Mean Error reflects the tendency of our estimator, that is: the Biased v.s Unbiased Estimation.


In Summary:

If you want to have unbiased estimator so our model is not underestimating or overestimating, you may consider taking Mean of Error into account.

 

参考链接:https://stackoverflow.com/questions/24378176/python-sci-kit-learn-metrics-difference-between-r2-score-and-explained-varian

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/119588.html原文链接:https://javaforall.net

(0)
上一篇 2021年11月21日 下午7:00
下一篇 2021年11月21日 下午8:00


相关推荐

  • 谁在操弄AI天价邀请码

    谁在操弄AI天价邀请码

    2026年3月15日
    2
  • 计算机原理 6.5 指令周期

    计算机原理 6.5 指令周期1、指令执行一般流程不同指令功能不同,数据通路不同,执行时间不同,如何安排时序?2、指令周期基本概念时钟周期=节拍脉冲=震荡周期作用:能完成一次微操作机器周期=cpu周期含义:从主存读出一

    2022年7月3日
    31
  • spring自定义注解实现(spring里面的注解)

    java注解:附在代码中的一些元信息,用于在编译、运行时起到说明、配置的功能。一、元注解java提供了4中元注解用于注解其他注解,所有的注解都是基于这四种注解来定义的。@Target注解:用于描述注解的使用范围,超出范围时编译失败。 取值类型(ElementType):  1.CONSTRUCTOR:用于描述构造器  2.FIELD:用于描述域(成

    2022年4月15日
    26
  • UDP协议开发

    UDP协议开发1简介在进行电网插件开发的过程中,对电网接入程序进行了开发,使得在综合安防管理平台上能够非常方便的接入天地维正电网设备。电网数据采用UDP协议,通过监狱局域网,向用户指定的5个IP地址的某端口,同时发送,各IP地址收到的数据相同。因为是第一次使用网络数据报进行开发,因此遇到了许多的坑。在这里把遇到的问题组织成一个文档,重新理解在代码撰写过程中遇到的问题。本文档适用于初次使用UDP进行…

    2022年5月31日
    74
  • 提升进程权限函数OpenProcessToken 及相关函数详解

    提升进程权限函数OpenProcessToken 及相关函数详解提升进程权限函数OpenProcessToken及相关函数详解http://m.blog.csdn.net/blog/Armstronghappy/8797630 LookupPrivilegeValue函数查看系统权限的特权值,返回信息到一个LUID结构体里。BOOLLookupPrivilegeValue(LPCTSTRlpSystemName,LPCTSTRlpN

    2022年6月25日
    32
  • OpenClaw新手0基础保姆级教程:安装+企微机器人对接(含命令速查表)

    OpenClaw新手0基础保姆级教程:安装+企微机器人对接(含命令速查表)

    2026年3月13日
    6

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号