cs285学习笔记(一):课程总览
根据 Fall 2023 学期的官方课程日程,这里是 CS 285 全课程的 Lecture 大纲及内容摘要,详细对应周次和主题,方便你快速定位每节课要点、相关作业与视频资源 🎯
官方课程地址
YouTobe 视频地址
blibli视频(带中文字幕)
📅 CS 285 Fall 2023 全课程Lecture大纲
周次 | Lecture & 主题 | 内容摘要 |
---|---|---|
Week 1 | Lecture 1: Introduction & Course Overview | 课程介绍、RL基本背景、工业/研究趋势分析 |
Week 2 | Lecture 2: Supervised Learning of Behaviors (Imitation Learning) | 行为克隆、DAgger、离线与在线模仿学习任务一(HW1) |
Lecture 3: PyTorch Tutorial | PyTorch基本用法,streamlined training pipeline | |
Week 3 | Lecture 4: Introduction to Reinforcement Learning | MDP、策略、价值函数基础、Monte Carlo采样 |
Week 4 | Lecture 5: Policy Gradients | REINFORCE算法、Likelihood-Ratio、本质推导、方差缩减 |
Lecture 6: Actor–Critic Algorithms | 基于 critic 的 actor-critic,G AE,实例代码讲解 | |
Week 5 | Lecture 7: Value Function Methods | TD λ、bootstrapping、策略评估手段 |
Lecture 8: Deep RL with Q‑Functions | DQN、experience replay、target network、训练稳定化 | |
Week 6 | Lecture 9: Advanced Policy Gradients | TRPO/PPO核心算法、KL约束、优势估计与实现细节 |
Lecture 10: Optimal Control & Planning | 基于控制理论的导航/规划方法(MPC)、线性系统控制 | |
Week 7 | Lecture 11: Model-Based Reinforcement Learning | 模型学习与模拟、预测模型结构与样本效率 |
Lecture 12: Model-Based Policy Learning | 模型下的策略学习(包括DDP, iLQR等) | |
Week 8 | Lecture 13: Exploration I | 探索策略基本形式:ε-greedy, UCB, entropy bonus |
Lecture 14: Exploration II | Count-based、curiosity-driven、随机网络蒸馏 | |
Week 9 | Lecture 15: Offline Reinforcement Learning I | Offline RL 介绍,批训练挑战,BMIST等 |
Lecture 16: Offline Reinforcement Learning II | OOD泛化、约束优化、安全保障 | |
Week 10 | Lecture 17: Reinforcement Learning Theory Basics | 收敛性分析、样本复杂度、策略优化几何 |
Lecture 18: Variational Inference & Generative Models | VI基础,control-as-inference链接 | |
Week 11 | Lecture 19: Connection between Inference and Control | 逆强化学习、最大熵控制、POMDP关系 |
Lecture 20: Inverse Reinforcement Learning | IRL核心算法:MaxEnt IRL、GAIL等 | |
Week 12 | Guest Lectures | 来自学术/工业专家专题分享(如 RLHF、DPO、Statistical RL) |
Week 13 | Lecture 21: RL with Sequence Models & Language Models | 序列RL、seq2seq RL、LLM 调优初探 |
Lecture 22: Meta-Learning and Transfer Learning | Meta-RL、跨任务泛化、Prompt调优、DPO & RLHFGuest | |
Week 14 | Lecture 23: Challenges & Open Problems | RL前沿挑战:长期依赖、安全、公平性、效用函数等 |
作业对应:
作业github
- HW1 → Lecture 2 / 3
- HW2 → Lecture 5 / 6
- HW3 → Lecture 7–12
- HW4 → Lecture 11–18
- HW5 → Lecture 13–20