当前位置：首页 > news >正文

cs285学习笔记（一）：课程总览

news 2025/7/14 8:27:57

根据 Fall 2023 学期的官方课程日程，这里是 CS 285 全课程的 Lecture 大纲及内容摘要，详细对应周次和主题，方便你快速定位每节课要点、相关作业与视频资源 🎯

官方课程地址

YouTobe 视频地址

blibli视频(带中文字幕)

📅 CS 285 Fall 2023 全课程Lecture大纲

周次	Lecture & 主题	内容摘要
Week 1	Lecture 1: Introduction & Course Overview	课程介绍、RL基本背景、工业/研究趋势分析
Week 2	Lecture 2: Supervised Learning of Behaviors (Imitation Learning)	行为克隆、DAgger、离线与在线模仿学习任务一（HW1）
	Lecture 3: PyTorch Tutorial	PyTorch基本用法，streamlined training pipeline
Week 3	Lecture 4: Introduction to Reinforcement Learning	MDP、策略、价值函数基础、Monte Carlo采样
Week 4	Lecture 5: Policy Gradients	REINFORCE算法、Likelihood-Ratio、本质推导、方差缩减
	Lecture 6: Actor–Critic Algorithms	基于 critic 的 actor-critic，G AE，实例代码讲解
Week 5	Lecture 7: Value Function Methods	TD λ、bootstrapping、策略评估手段
	Lecture 8: Deep RL with Q‑Functions	DQN、experience replay、target network、训练稳定化
Week 6	Lecture 9: Advanced Policy Gradients	TRPO/PPO核心算法、KL约束、优势估计与实现细节
	Lecture 10: Optimal Control & Planning	基于控制理论的导航/规划方法（MPC）、线性系统控制
Week 7	Lecture 11: Model-Based Reinforcement Learning	模型学习与模拟、预测模型结构与样本效率
	Lecture 12: Model-Based Policy Learning	模型下的策略学习（包括DDP, iLQR等）
Week 8	Lecture 13: Exploration I	探索策略基本形式：ε-greedy, UCB, entropy bonus
	Lecture 14: Exploration II	Count-based、curiosity-driven、随机网络蒸馏
Week 9	Lecture 15: Offline Reinforcement Learning I	Offline RL 介绍，批训练挑战，BMIST等
	Lecture 16: Offline Reinforcement Learning II	OOD泛化、约束优化、安全保障
Week 10	Lecture 17: Reinforcement Learning Theory Basics	收敛性分析、样本复杂度、策略优化几何
	Lecture 18: Variational Inference & Generative Models	VI基础，control-as-inference链接
Week 11	Lecture 19: Connection between Inference and Control	逆强化学习、最大熵控制、POMDP关系
	Lecture 20: Inverse Reinforcement Learning	IRL核心算法：MaxEnt IRL、GAIL等
Week 12	Guest Lectures	来自学术/工业专家专题分享（如 RLHF、DPO、Statistical RL）
Week 13	Lecture 21: RL with Sequence Models & Language Models	序列RL、seq2seq RL、LLM 调优初探
	Lecture 22: Meta-Learning and Transfer Learning	Meta-RL、跨任务泛化、Prompt调优、DPO & RLHFGuest
Week 14	Lecture 23: Challenges & Open Problems	RL前沿挑战：长期依赖、安全、公平性、效用函数等