DDPG Project「建议收藏」

全栈程序员-站长 • 2022年6月28日下午10:36 • 未分类 • 阅读 22

DDPG Project「建议收藏」1.RememberthedifferencebetweentheDQNandDDPGintheQfunctionlearningisthattheTarget’snextMAXQvalueisestimatedbytheactor,notthecriticitself.(Incontinuousactionspace,the…

大家好，又见面了，我是你们的朋友全栈君。

1. Remember the difference between the DQN and DDPG in the Q function learning is that the Target’s next MAX Q value is estimated by the actor, not the critic itself. (In continuous action space, the critic cannot estimate the MAX Q value without optimization. So the best choice is to use actor directly gives the BEST action.)

The code of 1st pic is wrong:

71: the critic_target network is to output the maximum Q value based on the estimation of actor_target network, so there is no need once more max operation (But in DQN we do need that max operation because in DQN the next Max Q value is directly estimated by critic_target itself (Q value function).)

72. the critic (Q function) in DDPG can directly output the relative input action Q value, so there is not need to gather the action index relative Q value.

74. Because optimizer will accumulate the gradient values. so use optimizer.zero_grad() to clear it.(instead of network.zero_grad)

75. Optimizer should call the step() function for backward the error.

. Do not forget to add the determination of final state: 1- dones.

DDPG Project「建议收藏」

DDPG Project「建议收藏」

79. In the actor learning part, the input actions of the critic_local is not the sample action, is the action estimated by actor. (Be careful with that). Also, it should calculate the mean of it. Finally, we want to maximize the performance but the optimizer is used to minimize object, so we have to set the negative sign.

DDPG Project「建议收藏」

In the soft_update, remember to use the attributes of the data to copy.

DDPG Project「建议收藏」

DDPG Project「建议收藏」

版权声明：本文内容由互联网用户自发贡献，该文观点仅代表作者本人。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如发现本站有涉嫌侵权/违法违规的内容，请联系我们举报，一经查实，本站将立刻删除。

发布者：全栈程序员-站长，转载请注明出处：https://javaforall.net/148618.html原文链接：https://javaforall.net

赞 (0)

全栈程序员-站长

0 0

python3中urlopen_Python3 urlopen()用法示例[通俗易懂]

对python这个高级语言感兴趣的小伙伴，下面一起跟随编程之家jb51.cc的小编两巴掌来看看吧！一.简介urllib.request.urlopen()函数用于实现对目标url的访问。函数原型如下：urllib.request.urlopen(url,data=None,[timeout,]*,cafile=None,capath=None,cadefault=False,context=N…

全栈程序员-站长
2022年4月7日
93
关于MIUI12.5或者任何无法使用GMS谷歌全家桶的解决办法[通俗易懂]

关于MIUI12.5或者任何无法使用GMS谷歌全家桶的解决办法[通俗易懂]首先感谢酷安社区的@兔子吃肉不吃草原文链接:https://www.coolapk.com/feed/24583874由于其文章全部都是图片贴的代码而且没加说明对于小白比较难懂,我在这里对其进行进一步的补充和完善必要条件:手机已经root,电脑安装有adb(不知道adb是什么没关系,下载个小米助手,里边会自带一个adb.exe)开始1.电脑连接手机手机打开usb调试，电脑连接手机2.进入adb命令行如果你之前已经安装了adb并且配置了环境变量，可以直接打开cmd

全栈程序员-站长
2022年6月27日
143
Java基础语法（五）运算符的那些事

Java基础语法（五）运算符的那些事

全栈程序员-站长
2021年4月21日
185
PHP实现记录浏览历史页面

PHP实现记录浏览历史页面

全栈程序员-站长
2021年10月30日
47
课设-基于51单片机的智能小车（循迹+避障+APP控制）[通俗易懂]

课设-基于51单片机的智能小车（循迹+避障+APP控制）[通俗易懂]基于51单片机的智能小车，可以实现循迹、避障、APP控制等功能；顺便提了一下自己大学的“造车”史！

全栈程序员-站长
2022年10月11日
2
全栈程序员社区-公众号

不要再叫他们「爸爸」了，一声「孙子」送给天下甲方！

哈喽狗子们好最近网上有个哥们吐槽甲方火了！他说甲方其实不是「爸爸」而是「孙子」…… 乍一听甲方可能就不满意了但你仔细一想就会点头同意了因为真的有理…

全栈程序员-站长
2021年6月21日
156

发表回复

关注全栈程序员社区公众号