具身系列——Double DQN算法实现CartPole游戏(强化学习)
完整代码参考: rl/ddqn_cartpole.py · 陈先生/ailib - Gitee.com
部分训练得分:
Model saved to ./output/best_model.pth
New best model saved with average reward: 9.6
Episode: 0 | Train Reward: 25.0 | Epsilon: 0.995 | Best Eval Avg: 9.6
Episode: 1 | Train Reward: 14.0 | Epsilon: 0.990 | Best Eval Avg: 9.6
Episode: 2 | Train Reward: 12.0 | Epsilon: 0.985 | Best Eval Avg: 9.6
Episode: 3 | Train Reward: 20.0 | Epsilon: 0.980 | Best Eval Avg: 9.6
Episode: 4 | Train Reward: 40.0 | Epsilon: 0.975 | Best Eval Avg: 9.6
Episode: 5 | Train Reward: 12.0 | Epsilon: 0.970 | Best Eval Avg: 9.6
Episode: 6 | Train Reward: 22.0 | Epsilon: 0.966 | Best Eval Avg: 9.6
Episode: 7 | Train Reward: 13.0 | Epsilon: 0.961 | Best Eval Avg: 9.6
Episode: 8 | Train Reward: 15.0 | Epsilon: 0.956 | Best Eval Avg: 9.6
Episode: 9 | Train Reward: 16.0 | Epsilon: 0.951 | Best Eval Avg: 9.6
……
Model saved to ./output/best_model.pth
New best model saved with average reward: 251.6
Episode: 90 | Train Reward: 25.0 | Epsilon: 0.634 | Best Eval Avg: 251.6
Episode: 91 | Train Reward: 150.0 | Epsilon: 0.631 | Best Eval Avg: 251.6
Episode: 92 | Train Reward: 44.0 | Epsilon: 0.627 | Best Eval Avg: 251.6
Episode: 93 | Train Reward: 23.0 | Epsilon: 0.624 | Best Eval Avg: 251.6
Episode: 94 | Train Reward: 27.0 | Epsilon: 0.621 | Best Eval Avg: 251.6
Episode: 95 | Train Reward: 79.0 | Epsilon: 0.618 | Best Eval Avg: 251.6
Episode: 96 | Train Reward: 103.0 | Epsilon: 0.615 | Best Eval Avg: 251.6
Episode: 97 | Train Reward: 45.0 | Epsilon: 0.612 | Best Eval Avg: 251.6
Episode: 98 | Train Reward: 33.0 | Epsilon: 0.609 | Best Eval Avg: 251.6
Episode: 99 | Train Reward: 33.0 | Epsilon: 0.606 | Best Eval Avg: 251.6
Test Episode: 0, Reward: 228.0
Test Episode: 1, Reward: 222.0
Test Episode: 2, Reward: 207.0