Python scikit-learn (metrics): difference between r2_score and explained_variance_score?

Python scikit-learn (metrics): difference between r2_score and explained_variance_score?

I noticed that that ‘r2_score’ and ‘explained_variance_score’ are both build-in sklearn.metrics methods for regression problems.

I was always under the impression that r2_score is the percent variance explained by the model. How is it different from ‘explained_variance_score’?

When would you choose one over the other?

Thanks!

 

OK, look at this example:

In [123]:
#data
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.957173447537
0.948608137045
In [124]:
#what explained_variance_score really is
1-np.cov(np.array(y_true)-np.array(y_pred))/np.cov(y_true)
Out[124]:
0.95717344753747324
In [125]:
#what r^2 really is
1-((np.array(y_true)-np.array(y_pred))**2).sum()/(4*np.array(y_true).std()**2)
Out[125]:
0.94860813704496794
In [126]:
#Notice that the mean residue is not 0
(np.array(y_true)-np.array(y_pred)).mean()
Out[126]:
-0.25
In [127]:
#if the predicted values are different, such that the mean residue IS 0:
y_pred=[2.5, 0.0, 2, 7]
(np.array(y_true)-np.array(y_pred)).mean()
Out[127]:
0.0
In [128]:
#They become the same stuff
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.982869379015
0.982869379015

 

So, when the mean residue is 0, they are the same. Which one to choose dependents on your needs, that is, is the mean residue suppose to be 0?

 

Most of the answers I found (including here) emphasize on the difference between R2 and Explained Variance Score, that is: The Mean Residue (i.e. The Mean of Error).

However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?


Refresher:

R2: is the Coefficient of Determination which measures the amount of variation explained by the (least-squares) Linear Regression.

You can look at it from a different angle for the purpose of evaluating the predicted values of y like this:

Varianceactual_y × R2actual_y = Variancepredicted_y

 

So intuitively, the more R2 is closer to 1, the more actual_y and predicted_y will have samevariance (i.e. same spread)


As previously mentioned, the main difference is the Mean of Error; and if we look at the formulas, we find that’s true:

R2 = 1 - [(Sum of Squared Residuals / n) / Variancey_actual]

Explained Variance Score = 1 - [Variance(Ypredicted - Yactual) / Variancey_actual]

 

in which:

Variance(Ypredicted - Yactual) = (Sum of Squared Residuals - Mean Error) / n 

 

So, obviously the only difference is that we are subtracting the Mean Error from the first formula! … But Why?


When we compare the R2 Score with the Explained Variance Score, we are basically checking the Mean Error; so if R2 = Explained Variance Score, that means: The Mean Error = Zero!

The Mean Error reflects the tendency of our estimator, that is: the Biased v.s Unbiased Estimation.


In Summary:

If you want to have unbiased estimator so our model is not underestimating or overestimating, you may consider taking Mean of Error into account.

 

参考链接:https://stackoverflow.com/questions/24378176/python-sci-kit-learn-metrics-difference-between-r2-score-and-explained-varian

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/119588.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • anaconda和python版本对照表

    anaconda和python版本对照表python2 python3 anaconda2/3 2.7.14 3.6.5 5.2.0 2.7.14 3.6.4 5.1.0 2.7.14 3.6.3 5.0.1 2.7.13 3.6.2 5.0.0 2.7.13 3.6.1 4.4.0 2.7.13 3.6.0 4.3.1 2….

    2022年5月28日
    487
  • java中int转long

    java中int转longpackagepid69;publicclassSolution{ publicstaticintmySqrt(intx){ intresult=0; if(x==0||x==1){ returnx; }else{ for(inti=0;i<=x/2;i++){ inta=i*i; //防止溢出,务必注意右边需…

    2022年6月6日
    38
  • StopWatch 简单使用

    StopWatch 简单使用StopWath是apachecommonslang3包下的一个任务执行时间监视器主要方法:start();//开始计时split();//设置split点getSplitTime();//获取从start到最后一次split的时间reset();//重置计时suspend();//暂停计时,直到调用resume()后才恢复计时resume();//恢复计时…

    2022年6月23日
    64
  • springboot 注解_pagehelper分页原理

    springboot 注解_pagehelper分页原理案例环境jdk1.8,mysql8.0,idea(工具),springboot,mybatis-plus详情看pom.xml项目结构结构解释项目由网关,公共依赖模块,和商品的优惠券,商品服务,订单服务,仓储服务和会员服务组成(案例测试使用shop-product,common),父工程为聚合工程不作为依赖管理一、common模块(组件可在人人开源项目中获取renrne-fast),common是一个基础maven项目pom.xml(common)<?xm.

    2022年7月28日
    6
  • VTP(VLAN Trunking Protocol)

    VTP(VLAN Trunking Protocol)

    2021年8月5日
    72
  • 国内外6款优秀的免费CDN服务「建议收藏」

    国内外6款优秀的免费CDN服务「建议收藏」CDN是一种新型网络构建方式,它是为能在传统的IP网发布宽带丰富媒体而特别优化的网络覆盖层;而从广义的角度,CDN代表了一种基于质量与秩序的网络服务模式。之前有过几篇文章介绍了CDNZZ和Cloudflare,今天再来系统推荐一下几家比较有名的CDN,都是免费的,或者其免费服务已经够用了。CDN主要特点1、本地Cache加速 提高了企业站点(尤其含有大量图片和静态页面站点)的访问速度,并大

    2025年9月3日
    6

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号