Python scikit-learn (metrics): difference between r2_score and explained_variance_score?

Python scikit-learn (metrics): difference between r2_score and explained_variance_score?

I noticed that that ‘r2_score’ and ‘explained_variance_score’ are both build-in sklearn.metrics methods for regression problems.

I was always under the impression that r2_score is the percent variance explained by the model. How is it different from ‘explained_variance_score’?

When would you choose one over the other?

Thanks!

 

OK, look at this example:

In [123]:
#data
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.957173447537
0.948608137045
In [124]:
#what explained_variance_score really is
1-np.cov(np.array(y_true)-np.array(y_pred))/np.cov(y_true)
Out[124]:
0.95717344753747324
In [125]:
#what r^2 really is
1-((np.array(y_true)-np.array(y_pred))**2).sum()/(4*np.array(y_true).std()**2)
Out[125]:
0.94860813704496794
In [126]:
#Notice that the mean residue is not 0
(np.array(y_true)-np.array(y_pred)).mean()
Out[126]:
-0.25
In [127]:
#if the predicted values are different, such that the mean residue IS 0:
y_pred=[2.5, 0.0, 2, 7]
(np.array(y_true)-np.array(y_pred)).mean()
Out[127]:
0.0
In [128]:
#They become the same stuff
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.982869379015
0.982869379015

 

So, when the mean residue is 0, they are the same. Which one to choose dependents on your needs, that is, is the mean residue suppose to be 0?

 

Most of the answers I found (including here) emphasize on the difference between R2 and Explained Variance Score, that is: The Mean Residue (i.e. The Mean of Error).

However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?


Refresher:

R2: is the Coefficient of Determination which measures the amount of variation explained by the (least-squares) Linear Regression.

You can look at it from a different angle for the purpose of evaluating the predicted values of y like this:

Varianceactual_y × R2actual_y = Variancepredicted_y

 

So intuitively, the more R2 is closer to 1, the more actual_y and predicted_y will have samevariance (i.e. same spread)


As previously mentioned, the main difference is the Mean of Error; and if we look at the formulas, we find that’s true:

R2 = 1 - [(Sum of Squared Residuals / n) / Variancey_actual]

Explained Variance Score = 1 - [Variance(Ypredicted - Yactual) / Variancey_actual]

 

in which:

Variance(Ypredicted - Yactual) = (Sum of Squared Residuals - Mean Error) / n 

 

So, obviously the only difference is that we are subtracting the Mean Error from the first formula! … But Why?


When we compare the R2 Score with the Explained Variance Score, we are basically checking the Mean Error; so if R2 = Explained Variance Score, that means: The Mean Error = Zero!

The Mean Error reflects the tendency of our estimator, that is: the Biased v.s Unbiased Estimation.


In Summary:

If you want to have unbiased estimator so our model is not underestimating or overestimating, you may consider taking Mean of Error into account.

 

参考链接:https://stackoverflow.com/questions/24378176/python-sci-kit-learn-metrics-difference-between-r2-score-and-explained-varian

版权声明:本文内容由互联网用户自发贡献,该文观点仅代表作者本人。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容, 请联系我们举报,一经查实,本站将立刻删除。

发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/119588.html原文链接:https://javaforall.net

(0)
全栈程序员-站长的头像全栈程序员-站长


相关推荐

  • 常用的锂电池升压IC「建议收藏」

    常用的锂电池升压IC「建议收藏」在锂电池供电的系统中,输入电压通常不高于4.2V(单节)/8.4V(2节),而在蓝牙音箱、电池检测、高亮手电筒、USBType-CPD、大尺寸面板门级驱动等场合,则需要高达9V或12V及以上的电压,远高于电源输入电压。因此,需要DC-DC升压转换器提供数倍于输入的输出电压,以满足这些系统中各种各样的电路和功能的需要。如HU5914系列是5V升压8.4V/12.6V16.8V/21V/28升压充电芯片现在市场上的DC-DC升压芯片分为同步升压和异步升压。同步升压比异步升压的优势就是拥有更快的导通速度、

    2022年10月7日
    4
  • java中byte的用法_nt宫颈长度多少是正常

    java中byte的用法_nt宫颈长度多少是正常1.概念JavaNIOAPI自带的缓冲区类功能相当有限,没有经过优化,使用JDK的ByteBuffer操作更复杂。故而Netty的作者TrustinLee为了实现高效率的网络传输,重新造轮子,Netty中的ByteBuf实际上就相当于JDK中的ByteBuffer,其作用是在Netty中通过Channel传输数据。2.优势可以自定义缓冲类型;通过内置的复合缓冲类型,实现透明的零拷贝(ze…

    2022年9月19日
    1
  • 阿里云Maven仓库完整版[通俗易懂]

    阿里云Maven仓库完整版[通俗易懂]阿里云Maven仓库完整版<?xmlversion=”1.0″encoding=”UTF-8″?><!–LicensedtotheApacheSoftwareFoundation(ASF)underoneormorecontributorlicenseagreements.SeetheNOTICEfiledistributedwiththisworkforadditionalinformationregardingcopy

    2025年7月11日
    4
  • php sql filestream,FileStream应用

    php sql filestream,FileStream应用FileStream:文件流,为了解决大对象BLOB(BinaryLargeObjects)的存储问题.对于大对象存储,并且不受2GB的限制.以往有两种方式:(1)存储在数据库里面,这种方式一般使用image字段,或者varbinary(max)来做,好处是可以统一备份,但实际效率较低;(2)存储在文件系FileStream:文件流,为了解决大对象BLOB(BinaryLargeOb…

    2022年7月24日
    6
  • 高并发下的nginx性能优化实战

    高并发下的nginx性能优化实战

    2022年2月13日
    37
  • phpstorm2021.4激活码_通用破解码

    phpstorm2021.4激活码_通用破解码,https://javaforall.net/100143.html。详细ieda激活码不妨到全栈程序员必看教程网一起来了解一下吧!

    2022年3月16日
    73

发表回复

您的邮箱地址不会被公开。 必填项已用 * 标注

关注全栈程序员社区公众号