Python scikit-learn (metrics): difference between r2_score and explained_variance_score?

Python scikit-learn (metrics): difference between r2_score and explained_variance_score?

I noticed that that ‘r2_score’ and ‘explained_variance_score’ are both build-in sklearn.metrics methods for regression problems.

I was always under the impression that r2_score is the percent variance explained by the model. How is it different from ‘explained_variance_score’?

When would you choose one over the other?

Thanks!

 

OK, look at this example:

In [123]:
#data
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.957173447537
0.948608137045
In [124]:
#what explained_variance_score really is
1-np.cov(np.array(y_true)-np.array(y_pred))/np.cov(y_true)
Out[124]:
0.95717344753747324
In [125]:
#what r^2 really is
1-((np.array(y_true)-np.array(y_pred))**2).sum()/(4*np.array(y_true).std()**2)
Out[125]:
0.94860813704496794
In [126]:
#Notice that the mean residue is not 0
(np.array(y_true)-np.array(y_pred)).mean()
Out[126]:
-0.25
In [127]:
#if the predicted values are different, such that the mean residue IS 0:
y_pred=[2.5, 0.0, 2, 7]
(np.array(y_true)-np.array(y_pred)).mean()
Out[127]:
0.0
In [128]:
#They become the same stuff
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.982869379015
0.982869379015

 

So, when the mean residue is 0, they are the same. Which one to choose dependents on your needs, that is, is the mean residue suppose to be 0?

 

Most of the answers I found (including here) emphasize on the difference between R2 and Explained Variance Score, that is: The Mean Residue (i.e. The Mean of Error).

However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?


Refresher:

R2: is the Coefficient of Determination which measures the amount of variation explained by the (least-squares) Linear Regression.

You can look at it from a different angle for the purpose of evaluating the predicted values of y like this:

Varianceactual_y × R2actual_y = Variancepredicted_y

 

So intuitively, the more R2 is closer to 1, the more actual_y and predicted_y will have samevariance (i.e. same spread)


As previously mentioned, the main difference is the Mean of Error; and if we look at the formulas, we find that’s true:

R2 = 1 - [(Sum of Squared Residuals / n) / Variancey_actual]

Explained Variance Score = 1 - [Variance(Ypredicted - Yactual) / Variancey_actual]

 

in which:

Variance(Ypredicted - Yactual) = (Sum of Squared Residuals - Mean Error) / n 

 

So, obviously the only difference is that we are subtracting the Mean Error from the first formula! … But Why?


When we compare the R2 Score with the Explained Variance Score, we are basically checking the Mean Error; so if R2 = Explained Variance Score, that means: The Mean Error = Zero!

The Mean Error reflects the tendency of our estimator, that is: the Biased v.s Unbiased Estimation.


In Summary:

If you want to have unbiased estimator so our model is not underestimating or overestimating, you may consider taking Mean of Error into account.

 

参考链接:https://stackoverflow.com/questions/24378176/python-sci-kit-learn-metrics-difference-between-r2-score-and-explained-varian

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/119588.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • 队列数据结构的典型应用_kafka优先级队列

    队列数据结构的典型应用_kafka优先级队列上一篇文章讲解了队列的相关知识,同时用代码实现了一个队列结构。那么本文将介绍一下另一种特殊的队列结构,叫做优先级队列。上一篇文章的跳转链接——公众号:Lpyexplore的编程小屋关注我,每天更新,带你在python爬虫的过程中学习前端,还有更多电子书和面试题等你来拿数据结构——优先级队列一、什么是优先级队列一、什么是优先级队列在了解了什么是队列以后,我们再来了解优先级队列,顾名思义,优先级队列就是在队列的基础上给每个元素加上了先后顺序,我们仍然拿排队买票的例子来讲解。…

    2022年9月24日
    0
  • C#中的invoke方法

    C#中的invoke方法在用.NETFramework框架的WinForm构建GUI程序界面时,如果要在控件的事件响应函数中改变控件的状态,例如:某个按钮上的文本原先叫“打开”,单击之后按钮上的文本显示“关闭”,初学者往往会想当然地这么写:voidButtonOnClick(objectsender,EventArgse){    button.Text=”关闭”;}这样的

    2022年5月22日
    261
  • PHP Fatal error: Uncaught Error: Call to undefined function posix_getpid()「建议收藏」

    PHP Fatal error: Uncaught Error: Call to undefined function posix_getpid()

    2022年2月12日
    40
  • 2020Java高级开发工程师面试题汇总

    2020Java高级开发工程师面试题汇总2020面试总结工作三年多,面试目标为高级开发工程师前言9.5–11.13,经过了长达70天的面试,终于有了结果。期间崩溃过无数次,很多次面试都被虐到怀疑人生,也有三面被刷掉无奈,一次次整装重新出发,一次次从头再来。今天有时间整理最近面试过程中涉及到的问题和经验,希望可以帮助到正在面试中或即将面试的同行们。一、面试过的公司阿里巴巴京东美团百度度小满金融爱奇艺当当网58同城贝壳找房快手小米滴滴微博陌陌中信银行尚德机构轻松筹货拉拉一起教育易车好未来二、面

    2022年5月29日
    49
  • java递归生成树形菜单_java递归无限层级树

    java递归生成树形菜单_java递归无限层级树java递归实现权限树(菜单树)省市县多级结构

    2022年9月15日
    0
  • python基础(2)字符串常用方法「建议收藏」

    python基础(2)字符串常用方法「建议收藏」python字符串常用方法find(sub[,start[,end]])在索引start和end之间查找字符串sub​找到,则返回最左端的索引值,未找到,则返回-1​start和end都可

    2022年7月29日
    3

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号