Python scikit-learn (metrics): difference between r2_score and explained_variance_score?

Python scikit-learn (metrics): difference between r2_score and explained_variance_score?

I noticed that that ‘r2_score’ and ‘explained_variance_score’ are both build-in sklearn.metrics methods for regression problems.

I was always under the impression that r2_score is the percent variance explained by the model. How is it different from ‘explained_variance_score’?

When would you choose one over the other?

Thanks!

 

OK, look at this example:

In [123]:
#data
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.957173447537
0.948608137045
In [124]:
#what explained_variance_score really is
1-np.cov(np.array(y_true)-np.array(y_pred))/np.cov(y_true)
Out[124]:
0.95717344753747324
In [125]:
#what r^2 really is
1-((np.array(y_true)-np.array(y_pred))**2).sum()/(4*np.array(y_true).std()**2)
Out[125]:
0.94860813704496794
In [126]:
#Notice that the mean residue is not 0
(np.array(y_true)-np.array(y_pred)).mean()
Out[126]:
-0.25
In [127]:
#if the predicted values are different, such that the mean residue IS 0:
y_pred=[2.5, 0.0, 2, 7]
(np.array(y_true)-np.array(y_pred)).mean()
Out[127]:
0.0
In [128]:
#They become the same stuff
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.982869379015
0.982869379015

 

So, when the mean residue is 0, they are the same. Which one to choose dependents on your needs, that is, is the mean residue suppose to be 0?

 

Most of the answers I found (including here) emphasize on the difference between R2 and Explained Variance Score, that is: The Mean Residue (i.e. The Mean of Error).

However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?


Refresher:

R2: is the Coefficient of Determination which measures the amount of variation explained by the (least-squares) Linear Regression.

You can look at it from a different angle for the purpose of evaluating the predicted values of y like this:

Varianceactual_y × R2actual_y = Variancepredicted_y

 

So intuitively, the more R2 is closer to 1, the more actual_y and predicted_y will have samevariance (i.e. same spread)


As previously mentioned, the main difference is the Mean of Error; and if we look at the formulas, we find that’s true:

R2 = 1 - [(Sum of Squared Residuals / n) / Variancey_actual]

Explained Variance Score = 1 - [Variance(Ypredicted - Yactual) / Variancey_actual]

 

in which:

Variance(Ypredicted - Yactual) = (Sum of Squared Residuals - Mean Error) / n 

 

So, obviously the only difference is that we are subtracting the Mean Error from the first formula! … But Why?


When we compare the R2 Score with the Explained Variance Score, we are basically checking the Mean Error; so if R2 = Explained Variance Score, that means: The Mean Error = Zero!

The Mean Error reflects the tendency of our estimator, that is: the Biased v.s Unbiased Estimation.


In Summary:

If you want to have unbiased estimator so our model is not underestimating or overestimating, you may consider taking Mean of Error into account.

 

参考链接:https://stackoverflow.com/questions/24378176/python-sci-kit-learn-metrics-difference-between-r2-score-and-explained-varian

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/119588.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • 关于深层次NAT在TUXEDO中配置的疑问

    关于深层次NAT在TUXEDO中配置的疑问

    2021年7月28日
    58
  • Android基础篇 RelativeLayout.LayoutParams

    Android基础篇 RelativeLayout.LayoutParams一、前言RelativeLayout.LayoutParams是一个RelativeLayout的布局参数(1)初始化//包裹内容WRAP_CONTENTRelativeLayout.LayoutParamslayoutParams=newRelativeLayout.LayoutParams(RelativeLayout.LayoutParams.WRAP_CONTENT,RelativeLayout.LayoutParams.WRAP_CONTENT);//全部内容M

    2022年7月17日
    16
  • 海思hi3516ev100开发板_海思V200

    海思hi3516ev100开发板_海思V2001安装ubunu14我的ubuntu14如下#uname-aLinuxubuntu4.4.0-142-generic#168~14.04.1-UbuntuSMPSatJan1911:26:28UTC2019x86_64x86_64x86_64GNU/Linux2软件包安装步骤1.配置默认使用bash执行sudodpkg-recon…

    2022年9月23日
    2
  • 列举出linux文件和目录常用的命令_查看centos根目录下有哪些内容

    列举出linux文件和目录常用的命令_查看centos根目录下有哪些内容目录命令总览ls(英文全拼:listfiles):列出目录及文件名cd(英文全拼:changedirectory):切换目录pwd(英文全拼:printworkdirectory):显

    2022年7月29日
    5
  • ip addr 和 ifconfig「建议收藏」

    ip addr 和 ifconfig「建议收藏」你知道怎么查看IP地址吗?当面试听到这个问题的时候,面试者常常会觉得走错了房间。我面试的是技术岗位啊,怎么问这么简单的问题?的确,即便没有专业学过计算机的人,只要倒腾过电脑,重装过系统,大多也会知道这个问题的答案:在Windows上是ipconfig,在Linux上是ifconfig。那你知道在Linux上还有什么其他命令可以查看IP地址吗?答案是ipad…

    2022年7月28日
    21
  • activemq常见面试题(jvm面试题总结及答案)

    是什么消息中间件。可以在分布式系统的不同服务之间进行消息的发送和接收它的出现解决了什么问题可以让系统解耦 比如:使用消息中间件,某一个服务,可能依赖了其他好几个服务。比如课程里面的运营商后台依赖了4个服务,那不用mq就和4个服务耦合,用了mq,就只和1个mq耦合。参考下图: 实际项目应用场景监听商品添加消息,接收消息,将对应的商品信息同步到索引库 每次添加完商品…

    2022年4月10日
    742

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号