当前位置: 首页 > news >正文

具身系列——Double DQN算法实现CartPole游戏(强化学习)

完整代码参考: rl/ddqn_cartpole.py · 陈先生/ailib - Gitee.com 

部分训练得分:

Model saved to ./output/best_model.pth
New best model saved with average reward: 9.6
Episode:   0 | Train Reward:  25.0 | Epsilon: 0.995 | Best Eval Avg: 9.6
Episode:   1 | Train Reward:  14.0 | Epsilon: 0.990 | Best Eval Avg: 9.6
Episode:   2 | Train Reward:  12.0 | Epsilon: 0.985 | Best Eval Avg: 9.6
Episode:   3 | Train Reward:  20.0 | Epsilon: 0.980 | Best Eval Avg: 9.6
Episode:   4 | Train Reward:  40.0 | Epsilon: 0.975 | Best Eval Avg: 9.6
Episode:   5 | Train Reward:  12.0 | Epsilon: 0.970 | Best Eval Avg: 9.6
Episode:   6 | Train Reward:  22.0 | Epsilon: 0.966 | Best Eval Avg: 9.6
Episode:   7 | Train Reward:  13.0 | Epsilon: 0.961 | Best Eval Avg: 9.6
Episode:   8 | Train Reward:  15.0 | Epsilon: 0.956 | Best Eval Avg: 9.6
Episode:   9 | Train Reward:  16.0 | Epsilon: 0.951 | Best Eval Avg: 9.6
……
Model saved to ./output/best_model.pth
New best model saved with average reward: 251.6
Episode:  90 | Train Reward:  25.0 | Epsilon: 0.634 | Best Eval Avg: 251.6
Episode:  91 | Train Reward: 150.0 | Epsilon: 0.631 | Best Eval Avg: 251.6
Episode:  92 | Train Reward:  44.0 | Epsilon: 0.627 | Best Eval Avg: 251.6
Episode:  93 | Train Reward:  23.0 | Epsilon: 0.624 | Best Eval Avg: 251.6
Episode:  94 | Train Reward:  27.0 | Epsilon: 0.621 | Best Eval Avg: 251.6
Episode:  95 | Train Reward:  79.0 | Epsilon: 0.618 | Best Eval Avg: 251.6
Episode:  96 | Train Reward: 103.0 | Epsilon: 0.615 | Best Eval Avg: 251.6
Episode:  97 | Train Reward:  45.0 | Epsilon: 0.612 | Best Eval Avg: 251.6
Episode:  98 | Train Reward:  33.0 | Epsilon: 0.609 | Best Eval Avg: 251.6
Episode:  99 | Train Reward:  33.0 | Epsilon: 0.606 | Best Eval Avg: 251.6
Test Episode: 0, Reward: 228.0
Test Episode: 1, Reward: 222.0
Test Episode: 2, Reward: 207.0

相关文章:

  • 永磁同步电机控制算法--基于PI的位置伺服控制
  • STM32智能垃圾桶:四种控制模式实战开发
  • axi总线粗略学习
  • 方案精读:110页华为云数据中心解决方案技术方案【附全文阅读】
  • 【Trae+LucidCoder】三分钟编写专业Dashboard页面
  • 35、C# 中的反射(Reflection)
  • C++类与对象—下:夯实面向对象编程的阶梯
  • Python之学习笔记(六)
  • 统计 三个工作日内到期的数据
  • 【多线程】八、线程池
  • TS 字面量类型
  • [2025]MySQL的事务机制是什么样的?redolog,undolog、binog三种日志的区别?二阶段提交是什么?ACID怎么保证的?主从复制的过程?
  • Jasper and Stella: distillation of SOTA embedding models
  • Solr 与 传统数据库的核心区别
  • 学习黑客Linux 命令
  • Django框架介绍+安装
  • 工业元宇宙:从虚拟仿真到虚实共生
  • 【mathematica】常见命令
  • 【51单片机6位数码管显示时间与秒表】2022-5-8
  • NPP库中libnppi模块介绍
  • 体坛联播|拜仁提前2轮德甲夺冠,赵心童11比6暂时领先
  • 在海拔3980米驻守:“全国先进工作者”刘鹏与洛戈梁子警务站的9年
  • 三亚回应“买水果9斤变6斤”:反映属实,拟对流动摊贩罚款5万元
  • 首部关于民营经济发展的基础性法律,有何亮点?专家解读
  • 网警查处编造传播“登顶泰山最高可得3万奖金”网络谣言者
  • 涉嫌严重违纪违法,57岁证监会副主席王建军被查