当前位置：首页 > news >正文

云南网站推广大连集团网站建设

news 2025/10/4 6:35:22

云南网站推广,大连集团网站建设,企业门户网站中国燃气,网络推广运营培训班欢迎去各大电商平台选购纸质版蘑菇书《Easy RL：强化学习教程》文章是根据蘑菇书EasyRL 以及新版本的gym编写的可运行代码和示例， 0.安装环境， 文章所使用的python版本为py310 库版本如下 cloudpickle3.1.1 Farama-Notifications0.0.4 g…

欢迎去各大电商平台选购纸质版蘑菇书《Easy RL：强化学习教程》

文章是根据蘑菇书EasyRL 以及新版本的gym编写的可运行代码和示例，

0.安装环境，

文章所使用的python版本为py310
库版本如下

cloudpickle==3.1.1
Farama-Notifications==0.0.4
gym-notices==0.0.8
gymnasium==1.1.1
numpy==2.2.4
pygame==2.6.1
typing_extensions==4.13.2

效果：

请添加图片描述

代码：

import gymnasium as gym
import numpy as npclass SimpleAgent:def __init__(self, env):passdef decide(self, observation):  # 决策position, velocity = observationlb = min(-0.09 * (position + 0.25) ** 2 + 0.03, 0.3 * (position + 0.9) ** 4 - 0.008)ub = -0.07 * (position + 0.38) ** 2 + 0.07if lb < velocity < ub:action = 2else:action = 0return action  # 返回动作def learn(self, *args):  # 学习passdef play(env, agent, seed_id,train=False):episode_reward = 0. # 记录回合总奖励，初始值为0observation, info = env.reset(seed=seed_id) # 重置游戏环境，开始新回合while True: # 不断循环，直到回合结束action = agent.decide(observation)observation, reward, terminated, truncated, info= env.step(action) # 执行动作episode_over = terminated or truncated # 是否结束episode_reward += reward # 收集回合奖励if train: # 判断是否训练智能体agent.learn(observation, action, reward, episode_over) # 学习，这里是空的if episode_over: # 回合结束，跳出循环observation, info = env.reset(seed=seed_id) # 游戏失败了，重设环境breakreturn episode_reward # 返回回合总奖励if __name__ == '__main__':SEED_ID = 3env = gym.make("MountainCar-v0", render_mode="human")print('观测空间 = {}'.format(env.observation_space))print('动作空间 = {}'.format(env.action_space))print('观测范围 = {} ~ {}'.format(env.observation_space.low,env.observation_space.high))print('动作数 = {}'.format(env.action_space.n))agent = SimpleAgent(env)episode_reward = play(env, agent,SEED_ID)print('回合奖励 = {}'.format(episode_reward))episode_rewards = [play(env, agent,SEED_ID) for _ in range(100)]print('平均回合奖励 = {}'.format(np.mean(episode_rewards)))env.close()  # 关闭图形界面