当前位置: 首页 > news >正文

具身系列——Double DQN算法实现CartPole游戏(强化学习)

完整代码参考: rl/ddqn_cartpole.py · 陈先生/ailib - Gitee.com 

部分训练得分:

Model saved to ./output/best_model.pth
New best model saved with average reward: 9.6
Episode:   0 | Train Reward:  25.0 | Epsilon: 0.995 | Best Eval Avg: 9.6
Episode:   1 | Train Reward:  14.0 | Epsilon: 0.990 | Best Eval Avg: 9.6
Episode:   2 | Train Reward:  12.0 | Epsilon: 0.985 | Best Eval Avg: 9.6
Episode:   3 | Train Reward:  20.0 | Epsilon: 0.980 | Best Eval Avg: 9.6
Episode:   4 | Train Reward:  40.0 | Epsilon: 0.975 | Best Eval Avg: 9.6
Episode:   5 | Train Reward:  12.0 | Epsilon: 0.970 | Best Eval Avg: 9.6
Episode:   6 | Train Reward:  22.0 | Epsilon: 0.966 | Best Eval Avg: 9.6
Episode:   7 | Train Reward:  13.0 | Epsilon: 0.961 | Best Eval Avg: 9.6
Episode:   8 | Train Reward:  15.0 | Epsilon: 0.956 | Best Eval Avg: 9.6
Episode:   9 | Train Reward:  16.0 | Epsilon: 0.951 | Best Eval Avg: 9.6
……
Model saved to ./output/best_model.pth
New best model saved with average reward: 251.6
Episode:  90 | Train Reward:  25.0 | Epsilon: 0.634 | Best Eval Avg: 251.6
Episode:  91 | Train Reward: 150.0 | Epsilon: 0.631 | Best Eval Avg: 251.6
Episode:  92 | Train Reward:  44.0 | Epsilon: 0.627 | Best Eval Avg: 251.6
Episode:  93 | Train Reward:  23.0 | Epsilon: 0.624 | Best Eval Avg: 251.6
Episode:  94 | Train Reward:  27.0 | Epsilon: 0.621 | Best Eval Avg: 251.6
Episode:  95 | Train Reward:  79.0 | Epsilon: 0.618 | Best Eval Avg: 251.6
Episode:  96 | Train Reward: 103.0 | Epsilon: 0.615 | Best Eval Avg: 251.6
Episode:  97 | Train Reward:  45.0 | Epsilon: 0.612 | Best Eval Avg: 251.6
Episode:  98 | Train Reward:  33.0 | Epsilon: 0.609 | Best Eval Avg: 251.6
Episode:  99 | Train Reward:  33.0 | Epsilon: 0.606 | Best Eval Avg: 251.6
Test Episode: 0, Reward: 228.0
Test Episode: 1, Reward: 222.0
Test Episode: 2, Reward: 207.0

http://www.dtcms.com/a/171494.html

相关文章:

  • 永磁同步电机控制算法--基于PI的位置伺服控制
  • STM32智能垃圾桶:四种控制模式实战开发
  • axi总线粗略学习
  • 方案精读:110页华为云数据中心解决方案技术方案【附全文阅读】
  • 【Trae+LucidCoder】三分钟编写专业Dashboard页面
  • 35、C# 中的反射(Reflection)
  • C++类与对象—下:夯实面向对象编程的阶梯
  • Python之学习笔记(六)
  • 统计 三个工作日内到期的数据
  • 【多线程】八、线程池
  • TS 字面量类型
  • [2025]MySQL的事务机制是什么样的?redolog,undolog、binog三种日志的区别?二阶段提交是什么?ACID怎么保证的?主从复制的过程?
  • Jasper and Stella: distillation of SOTA embedding models
  • Solr 与 传统数据库的核心区别
  • 学习黑客Linux 命令
  • Django框架介绍+安装
  • 工业元宇宙:从虚拟仿真到虚实共生
  • 【mathematica】常见命令
  • 【51单片机6位数码管显示时间与秒表】2022-5-8
  • NPP库中libnppi模块介绍
  • 启发式算法-遗传算法
  • 如何用更少的显存训练 PyTorch 模型
  • 深入理解 Spring MVC:DispatcherServlet 与视图解析机制​
  • CSDN积分详解(介绍、获取、用途)
  • Docker 使用与部署(超详细)
  • SpringCloud教程 — 无废话从0到1逐步学习
  • Mybatis学习(下)
  • python进阶(3)字符串格式化
  • 【翻译、转载】MCP 核心架构
  • 黑马商城(七)MQ高级