当前位置: 首页 > news >正文

DDPM优化目标公式推导

DDPM优化目标公式推导

DDPM优化目标公式推导

DDPM(Denoising Diffusion Probabilistic Models)的优化目标推导基于变分下界(Variational Lower Bound, VLB)证据下界(Evidence Lower Bound, ELBO)。以下是详细推导过程:


1. 问题定义

  • 目标:学习一个模型 p θ ( x 0 ) p_\theta(\mathbf{x}_0) pθ(x0) 逼近真实数据分布 q ( x 0 ) q(\mathbf{x}_0) q(x0)
  • 前向过程(扩散过程)
    固定方差序列 β 1 , … , β T \beta_1, \dots, \beta_T β1,,βT,定义马尔可夫链:
    q ( x 1 : T ∣ x 0 ) = ∏ t = 1 T q ( x t ∣ x t − 1 ) , q ( x t ∣ x t − 1 ) = N ( x t ; 1 − β t x t − 1 , β t I ) q(\mathbf{x}_{1:T} | \mathbf{x}_0) = \prod_{t=1}^T q(\mathbf{x}_t | \mathbf{x}_{t-1}), \quad q(\mathbf{x}_t | \mathbf{x}_{t-1}) = \mathcal{N}(\mathbf{x}_t; \sqrt{1 - \beta_t} \mathbf{x}_{t-1}, \beta_t \mathbf{I}) q(x1:Tx0)=t=1Tq(xtxt1),q(xtxt1)=N(xt;1βt xt1,βtI)
  • 反向过程(生成过程)
    学习参数化的马尔可夫链:
    p θ ( x 0 : T ) = p ( x T ) ∏ t = 1 T p θ ( x t − 1 ∣ x t ) , p θ ( x t − 1 ∣ x t ) = N ( x t − 1 ; μ θ ( x t , t ) , Σ θ ( x t , t ) ) p_\theta(\mathbf{x}_{0:T}) = p(\mathbf{x}_T) \prod_{t=1}^T p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t), \quad p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t) = \mathcal{N}(\mathbf{x}_{t-1}; \boldsymbol{\mu}_\theta(\mathbf{x}_t, t), \boldsymbol{\Sigma}_\theta(\mathbf{x}_t, t)) pθ(x0:T)=p(xT)t=1Tpθ(xt1xt),pθ(xt1xt)=N(xt1;μθ(xt,t),Σθ(xt,t))

2. 优化目标:最大化对数似然

目标是最大化 log ⁡ p θ ( x 0 ) \log p_\theta(\mathbf{x}_0) logpθ(x0),但直接计算困难,转而最大化其变分下界:
log ⁡ p θ ( x 0 ) ≥ E q ( x 1 : T ∣ x 0 ) [ log ⁡ p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) ] ≜ VLB \log p_\theta(\mathbf{x}_0) \geq \mathbb{E}_{q(\mathbf{x}_{1:T} | \mathbf{x}_0)} \left[ \log \frac{p_\theta(\mathbf{x}_{0:T})}{q(\mathbf{x}_{1:T} | \mathbf{x}_0)} \right] \triangleq \text{VLB} logpθ(x0)Eq(x1:Tx0)[logq(x1:Tx0)pθ(x0:T)]VLB


3. 变分下界的分解

将 VLB 展开并分解:
VLB = E q ( x 1 : T ∣ x 0 ) [ log ⁡ p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) ] = E q [ log ⁡ p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) ] = E q [ log ⁡ p θ ( x T ) + ∑ t = 1 T log ⁡ p θ ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] \begin{align*} \text{VLB} &= \mathbb{E}_{q(\mathbf{x}_{1:T} | \mathbf{x}_0)} \left[ \log \frac{p_\theta(\mathbf{x}_{0:T})}{q(\mathbf{x}_{1:T} | \mathbf{x}_0)} \right] \\ &= \mathbb{E}_{q} \left[ \log \frac{p_\theta(\mathbf{x}_{0:T})}{q(\mathbf{x}_{1:T} | \mathbf{x}_0)} \right] \\ &= \mathbb{E}_{q} \left[ \log p_\theta(\mathbf{x}_T) + \sum_{t=1}^T \log \frac{p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t)}{q(\mathbf{x}_t | \mathbf{x}_{t-1})} \right] \\ \end{align*} VLB=Eq(x1:Tx0)[logq(x1:Tx0)pθ(x0:T)]=Eq[logq(x1:Tx0)pθ(x0:T)]=Eq[logpθ(xT)+t=1Tlogq(xtxt1)pθ(xt1xt)]
利用马尔可夫性质,改写为:
VLB = E q [ log ⁡ p θ ( x 0 ∣ x 1 ) + ∑ t = 2 T log ⁡ p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) − ∑ t = 1 T log ⁡ q ( x t ∣ x t − 1 ) q ( x t − 1 ∣ x 0 ) ] + C \text{VLB} = \mathbb{E}_{q} \left[ \log p_\theta(\mathbf{x}_0 | \mathbf{x}_1) + \sum_{t=2}^T \log \frac{p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t)}{q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0)} - \sum_{t=1}^T \log \frac{q(\mathbf{x}_t | \mathbf{x}_{t-1})}{q(\mathbf{x}_{t-1} | \mathbf{x}_0)} \right] + C VLB=Eq[logpθ(x0x1)+t=2Tlogq(xt1xt,x0)pθ(xt1xt)t=1Tlogq(xt1x0)q(xtxt1)]+C
最终简化为:
VLB = E q [ log ⁡ p θ ( x 0 ∣ x 1 ) ] − ∑ t = 2 T E q [ D KL ( q ( x t − 1 ∣ x t , x 0 ) ∥ p θ ( x t − 1 ∣ x t ) ) ] − D KL ( q ( x T ∣ x 0 ) ∥ p ( x T ) ) \boxed{\text{VLB} = \mathbb{E}_{q} \left[ \log p_\theta(\mathbf{x}_0 | \mathbf{x}_1) \right] - \sum_{t=2}^T \mathbb{E}_{q} \left[ D_\text{KL} \left( q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t) \right) \right] - D_\text{KL} \left( q(\mathbf{x}_T | \mathbf{x}_0) \parallel p(\mathbf{x}_T) \right)} VLB=Eq[logpθ(x0x1)]t=2TEq[DKL(q(xt1xt,x0)pθ(xt1xt))]DKL(q(xTx0)p(xT))


4. 关键步骤:简化 KL 散度项

(a) 后验分布 q ( x t − 1 ∣ x t , x 0 ) q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) q(xt1xt,x0) 的闭式解

由贝叶斯公式:
q ( x t − 1 ∣ x t , x 0 ) = N ( x t − 1 ; μ ~ t ( x t , x 0 ) , β ~ t I ) q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) = \mathcal{N}(\mathbf{x}_{t-1}; \tilde{\boldsymbol{\mu}}_t(\mathbf{x}_t, \mathbf{x}_0), \tilde{\beta}_t \mathbf{I}) q(xt1xt,x0)=N(xt1;μ~t(xt,x0),β~tI)
其中:
μ ~ t ( x t , x 0 ) = α ˉ t − 1 β t 1 − α ˉ t x 0 + α t ( 1 − α ˉ t − 1 ) 1 − α ˉ t x t , β ~ t = 1 − α ˉ t − 1 1 − α ˉ t β t \tilde{\boldsymbol{\mu}}_t(\mathbf{x}_t, \mathbf{x}_0) = \frac{\sqrt{\bar{\alpha}_{t-1}} \beta_t}{1 - \bar{\alpha}_t} \mathbf{x}_0 + \frac{\sqrt{\alpha_t} (1 - \bar{\alpha}_{t-1})}{1 - \bar{\alpha}_t} \mathbf{x}_t, \quad \tilde{\beta}_t = \frac{1 - \bar{\alpha}_{t-1}}{1 - \bar{\alpha}_t} \beta_t μ~t(xt,x0)=1αˉtαˉt1 βtx0+1αˉtαt (1αˉt1)xt,β~t=1αˉt1αˉt1βt
(记 α t = 1 − β t \alpha_t = 1 - \beta_t αt=1βt, α ˉ t = ∏ i = 1 t α i \bar{\alpha}_t = \prod_{i=1}^t \alpha_i αˉt=i=1tαi

(b) 参数化均值 μ θ ( x t , t ) \boldsymbol{\mu}_\theta(\mathbf{x}_t, t) μθ(xt,t)

p θ ( x t − 1 ∣ x t ) = N ( x t − 1 ; μ θ ( x t , t ) , Σ θ ( x t , t ) ) p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t) = \mathcal{N}(\mathbf{x}_{t-1}; \boldsymbol{\mu}_\theta(\mathbf{x}_t, t), \boldsymbol{\Sigma}_\theta(\mathbf{x}_t, t)) pθ(xt1xt)=N(xt1;μθ(xt,t),Σθ(xt,t))
为匹配后验分布,选择:
μ θ ( x t , t ) = μ ~ t ( x t , x t − 1 − α ˉ t ϵ θ α ˉ t ) \boldsymbol{\mu}_\theta(\mathbf{x}_t, t) = \tilde{\boldsymbol{\mu}}_t \left( \mathbf{x}_t, \frac{\mathbf{x}_t - \sqrt{1 - \bar{\alpha}_t} \boldsymbol{\epsilon}_\theta}{\sqrt{\bar{\alpha}_t}} \right) μθ(xt,t)=μ~t(xt,αˉt xt1αˉt ϵθ)
代入闭式解得:
μ θ = 1 α t ( x t − β t 1 − α ˉ t ϵ θ ( x t , t ) ) \boldsymbol{\mu}_\theta = \frac{1}{\sqrt{\alpha_t}} \left( \mathbf{x}_t - \frac{\beta_t}{\sqrt{1 - \bar{\alpha}_t}} \boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t) \right) μθ=αt 1(xt1αˉt βtϵθ(xt,t))

© KL 散度的闭式解

两个高斯分布的 KL 散度为:
D KL ( N ( μ 1 , Σ 1 ) ∥ N ( μ 2 , Σ 2 ) ) = 1 2 [ log ⁡ ∣ Σ 2 ∣ ∣ Σ 1 ∣ − d + tr ( Σ 2 − 1 Σ 1 ) + ( μ 2 − μ 1 ) ⊤ Σ 2 − 1 ( μ 2 − μ 1 ) ] D_\text{KL}(\mathcal{N}(\boldsymbol{\mu}_1, \boldsymbol{\Sigma}_1) \parallel \mathcal{N}(\boldsymbol{\mu}_2, \boldsymbol{\Sigma}_2)) = \frac{1}{2} \left[ \log \frac{|\boldsymbol{\Sigma}_2|}{|\boldsymbol{\Sigma}_1|} - d + \text{tr}(\boldsymbol{\Sigma}_2^{-1} \boldsymbol{\Sigma}_1) + (\boldsymbol{\mu}_2 - \boldsymbol{\mu}_1)^\top \boldsymbol{\Sigma}_2^{-1} (\boldsymbol{\mu}_2 - \boldsymbol{\mu}_1) \right] DKL(N(μ1,Σ1)N(μ2,Σ2))=21[logΣ1Σ2d+tr(Σ21Σ1)+(μ2μ1)Σ21(μ2μ1)]
假设 Σ θ = σ t 2 I \boldsymbol{\Sigma}_\theta = \sigma_t^2 \mathbf{I} Σθ=σt2I(常取 σ t 2 = β t \sigma_t^2 = \beta_t σt2=βt β ~ t \tilde{\beta}_t β~t),则:
D KL = 1 2 σ t 2 ∥ μ ~ t − μ θ ∥ 2 + C D_\text{KL} = \frac{1}{2\sigma_t^2} \| \tilde{\boldsymbol{\mu}}_t - \boldsymbol{\mu}_\theta \|^2 + C DKL=2σt21μ~tμθ2+C
代入 μ θ \boldsymbol{\mu}_\theta μθ μ ~ t \tilde{\boldsymbol{\mu}}_t μ~t 的表达式:
μ ~ t − μ θ = β t α t 1 − α ˉ t ( ϵ − ϵ θ ( x t , t ) ) \tilde{\boldsymbol{\mu}}_t - \boldsymbol{\mu}_\theta = \frac{\beta_t}{\sqrt{\alpha_t} \sqrt{1 - \bar{\alpha}_t}} \left( \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t) \right) μ~tμθ=αt 1αˉt βt(ϵϵθ(xt,t))
其中 x t = α ˉ t x 0 + 1 − α ˉ t ϵ \mathbf{x}_t = \sqrt{\bar{\alpha}_t} \mathbf{x}_0 + \sqrt{1 - \bar{\alpha}_t} \boldsymbol{\epsilon} xt=αˉt x0+1αˉt ϵ。最终:
D KL ∝ E x 0 , ϵ [ ∥ ϵ − ϵ θ ( x t , t ) ∥ 2 ] \boxed{D_\text{KL} \propto \mathbb{E}_{\mathbf{x}_0, \boldsymbol{\epsilon}} \left[ \| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t) \|^2 \right]} DKLEx0,ϵ[ϵϵθ(xt,t)2]


5. 最终优化目标

忽略常数项和权重,DDPM 的简化目标为:
L simple ( θ ) = E t , x 0 , ϵ [ ∥ ϵ − ϵ θ ( x t , t ) ∥ 2 ] \mathcal{L}_\text{simple}(\theta) = \mathbb{E}_{t, \mathbf{x}_0, \boldsymbol{\epsilon}} \left[ \| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t) \|^2 \right] Lsimple(θ)=Et,x0,ϵ[ϵϵθ(xt,t)2]
其中:

  • t ∼ Uniform ( 1 , T ) t \sim \text{Uniform}(1, T) tUniform(1,T)
  • x 0 ∼ q ( x 0 ) \mathbf{x}_0 \sim q(\mathbf{x}_0) x0q(x0)
  • ϵ ∼ N ( 0 , I ) \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}) ϵN(0,I)
  • x t = α ˉ t x 0 + 1 − α ˉ t ϵ \mathbf{x}_t = \sqrt{\bar{\alpha}_t} \mathbf{x}_0 + \sqrt{1 - \bar{\alpha}_t} \boldsymbol{\epsilon} xt=αˉt x0+1αˉt ϵ

关键结论

DDPM 通过训练一个网络 ϵ θ \boldsymbol{\epsilon}_\theta ϵθ 预测添加到样本中的噪声,最小化噪声预测的均方误差,从而实现数据生成。此目标等价于对数据分布的梯度(分数)进行匹配,与基于分数的生成模型有深刻联系。

补充内容(详细推导)

详细推导:变分下界(VLB)的最终简化形式

以下完整推导从初始 VLB 表达式出发,逐步简化为最终形式:

VLB = E q [ log ⁡ p θ ( x 0 ∣ x 1 ) ] − ∑ t = 2 T E q [ D KL ( q ( x t − 1 ∣ x t , x 0 ) ∥ p θ ( x t − 1 ∣ x t ) ) ] − D KL ( q ( x T ∣ x 0 ) ∥ p ( x T ) ) \boxed{\text{VLB} = \mathbb{E}_{q} \left[ \log p_\theta(\mathbf{x}_0 | \mathbf{x}_1) \right] - \sum_{t=2}^T \mathbb{E}_{q} \left[ D_\text{KL} \left( q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t) \right) \right] - D_\text{KL} \left( q(\mathbf{x}_T | \mathbf{x}_0) \parallel p(\mathbf{x}_T) \right)} VLB=Eq[logpθ(x0x1)]t=2TEq[DKL(q(xt1xt,x0)pθ(xt1xt))]DKL(q(xTx0)p(xT))


步骤 1: 初始 VLB 表达式

由变分推断,证据下界(ELBO)为:
log ⁡ p θ ( x 0 ) ≥ E q ( x 1 : T ∣ x 0 ) [ log ⁡ p θ ( x 0 : T ) q ( x 1 : T ∣ x 0 ) ] ⏟ VLB \log p_\theta(\mathbf{x}_0) \geq \underbrace{\mathbb{E}_{q(\mathbf{x}_{1:T} | \mathbf{x}_0)} \left[ \log \frac{p_\theta(\mathbf{x}_{0:T})}{q(\mathbf{x}_{1:T} | \mathbf{x}_0)} \right]}_{\text{VLB}} logpθ(x0)VLB Eq(x1:Tx0)[logq(x1:Tx0)pθ(x0:T)]
代入联合概率分解:
p θ ( x 0 : T ) = p ( x T ) ∏ t = 1 T p θ ( x t − 1 ∣ x t ) , q ( x 1 : T ∣ x 0 ) = ∏ t = 1 T q ( x t ∣ x t − 1 ) p_\theta(\mathbf{x}_{0:T}) = p(\mathbf{x}_T) \prod_{t=1}^T p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t), \quad q(\mathbf{x}_{1:T} | \mathbf{x}_0) = \prod_{t=1}^T q(\mathbf{x}_t | \mathbf{x}_{t-1}) pθ(x0:T)=p(xT)t=1Tpθ(xt1xt),q(x1:Tx0)=t=1Tq(xtxt1)
得:
VLB = E q [ log ⁡ p ( x T ) ∏ t = 1 T p θ ( x t − 1 ∣ x t ) ∏ t = 1 T q ( x t ∣ x t − 1 ) ] \text{VLB} = \mathbb{E}_{q} \left[ \log \frac{p(\mathbf{x}_T) \prod_{t=1}^T p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t)}{\prod_{t=1}^T q(\mathbf{x}_t | \mathbf{x}_{t-1})} \right] VLB=Eq[logt=1Tq(xtxt1)p(xT)t=1Tpθ(xt1xt)]


步骤 2: 对数展开与重组

展开对数项:
VLB = E q [ log ⁡ p ( x T ) + ∑ t = 1 T log ⁡ p θ ( x t − 1 ∣ x t ) − ∑ t = 1 T log ⁡ q ( x t ∣ x t − 1 ) ] = E q [ log ⁡ p ( x T ) + ∑ t = 1 T log ⁡ p θ ( x t − 1 ∣ x t ) q ( x t ∣ x t − 1 ) ] \begin{align*} \text{VLB} &= \mathbb{E}_{q} \left[ \log p(\mathbf{x}_T) + \sum_{t=1}^T \log p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t) - \sum_{t=1}^T \log q(\mathbf{x}_t | \mathbf{x}_{t-1}) \right] \\ &= \mathbb{E}_{q} \left[ \log p(\mathbf{x}_T) + \sum_{t=1}^T \log \frac{p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t)}{q(\mathbf{x}_t | \mathbf{x}_{t-1})} \right] \end{align*} VLB=Eq[logp(xT)+t=1Tlogpθ(xt1xt)t=1Tlogq(xtxt1)]=Eq[logp(xT)+t=1Tlogq(xtxt1)pθ(xt1xt)]


步骤 3: 引入关键变量 x 0 \mathbf{x}_0 x0

利用条件概率的 贝叶斯定理 改写 q ( x t ∣ x t − 1 ) q(\mathbf{x}_t | \mathbf{x}_{t-1}) q(xtxt1):
q ( x t ∣ x t − 1 ) = q ( x t − 1 ∣ x t , x 0 ) ⋅ q ( x t ∣ x 0 ) q ( x t − 1 ∣ x 0 ) q(\mathbf{x}_t | \mathbf{x}_{t-1}) = \frac{q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) \cdot q(\mathbf{x}_t | \mathbf{x}_0)}{q(\mathbf{x}_{t-1} | \mathbf{x}_0)} q(xtxt1)=q(xt1x0)q(xt1xt,x0)q(xtx0)
代入得:
log ⁡ q ( x t ∣ x t − 1 ) = log ⁡ q ( x t − 1 ∣ x t , x 0 ) + log ⁡ q ( x t ∣ x 0 ) − log ⁡ q ( x t − 1 ∣ x 0 ) \log q(\mathbf{x}_t | \mathbf{x}_{t-1}) = \log q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) + \log q(\mathbf{x}_t | \mathbf{x}_0) - \log q(\mathbf{x}_{t-1} | \mathbf{x}_0) logq(xtxt1)=logq(xt1xt,x0)+logq(xtx0)logq(xt1x0)


步骤 4: 代入并拆分求和项

将上述表达式代入 VLB:
VLB = E q [ log ⁡ p ( x T ) + ∑ t = 1 T log ⁡ p θ ( x t − 1 ∣ x t ) − ∑ t = 1 T ( log ⁡ q ( x t − 1 ∣ x t , x 0 ) + log ⁡ q ( x t ∣ x 0 ) − log ⁡ q ( x t − 1 ∣ x 0 ) ) ] \begin{align*} \text{VLB} &= \mathbb{E}_{q} \Bigg[ \log p(\mathbf{x}_T) + \sum_{t=1}^T \log p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t) \\ &- \sum_{t=1}^T \Big( \log q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) + \log q(\mathbf{x}_t | \mathbf{x}_0) - \log q(\mathbf{x}_{t-1} | \mathbf{x}_0) \Big) \Bigg] \end{align*} VLB=Eq[logp(xT)+t=1Tlogpθ(xt1xt)t=1T(logq(xt1xt,x0)+logq(xtx0)logq(xt1x0))]
重组三项求和:
VLB = E q [ log ⁡ p ( x T ) + ∑ t = 1 T log ⁡ p θ ( x t − 1 ∣ x t ) − ∑ t = 1 T log ⁡ q ( x t − 1 ∣ x t , x 0 ) − ∑ t = 1 T log ⁡ q ( x t ∣ x 0 ) + ∑ t = 1 T log ⁡ q ( x t − 1 ∣ x 0 ) ] \begin{align*} \text{VLB} &= \mathbb{E}_{q} \Bigg[ \log p(\mathbf{x}_T) + \sum_{t=1}^T \log p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t) - \sum_{t=1}^T \log q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) \\ &- \sum_{t=1}^T \log q(\mathbf{x}_t | \mathbf{x}_0) + \sum_{t=1}^T \log q(\mathbf{x}_{t-1} | \mathbf{x}_0) \Bigg] \end{align*} VLB=Eq[logp(xT)+t=1Tlogpθ(xt1xt)t=1Tlogq(xt1xt,x0)t=1Tlogq(xtx0)+t=1Tlogq(xt1x0)]


步骤 5: 处理望远镜求和项

定义中间项:
S = − ∑ t = 1 T log ⁡ q ( x t ∣ x 0 ) + ∑ t = 1 T log ⁡ q ( x t − 1 ∣ x 0 ) S = -\sum_{t=1}^T \log q(\mathbf{x}_t | \mathbf{x}_0) + \sum_{t=1}^T \log q(\mathbf{x}_{t-1} | \mathbf{x}_0) S=t=1Tlogq(xtx0)+t=1Tlogq(xt1x0)
展开求和:
S = [ − log ⁡ q ( x 1 ∣ x 0 ) − log ⁡ q ( x 2 ∣ x 0 ) − ⋯ − log ⁡ q ( x T ∣ x 0 ) ] + [ log ⁡ q ( x 0 ∣ x 0 ) + log ⁡ q ( x 1 ∣ x 0 ) + ⋯ + log ⁡ q ( x T − 1 ∣ x 0 ) ] \begin{align*} S &= \Big[ -\log q(\mathbf{x}_1 | \mathbf{x}_0) - \log q(\mathbf{x}_2 | \mathbf{x}_0) - \cdots - \log q(\mathbf{x}_T | \mathbf{x}_0) \Big] \\ &+ \Big[ \log q(\mathbf{x}_0 | \mathbf{x}_0) + \log q(\mathbf{x}_1 | \mathbf{x}_0) + \cdots + \log q(\mathbf{x}_{T-1} | \mathbf{x}_0) \Big] \end{align*} S=[logq(x1x0)logq(x2x0)logq(xTx0)]+[logq(x0x0)+logq(x1x0)++logq(xT1x0)]
望远镜消去 相同项(如 log ⁡ q ( x 1 ∣ x 0 ) \log q(\mathbf{x}_1 | \mathbf{x}_0) logq(x1x0) 等):
S = log ⁡ q ( x 0 ∣ x 0 ) − log ⁡ q ( x T ∣ x 0 ) S = \log q(\mathbf{x}_0 | \mathbf{x}_0) - \log q(\mathbf{x}_T | \mathbf{x}_0) S=logq(x0x0)logq(xTx0)
由于 q ( x 0 ∣ x 0 ) = 1 q(\mathbf{x}_0 | \mathbf{x}_0) = 1 q(x0x0)=1(确定性分布),有 log ⁡ q ( x 0 ∣ x 0 ) = 0 \log q(\mathbf{x}_0 | \mathbf{x}_0) = 0 logq(x0x0)=0,故:
S = − log ⁡ q ( x T ∣ x 0 ) S = - \log q(\mathbf{x}_T | \mathbf{x}_0) S=logq(xTx0)


步骤 6: 分离 t = 1 t=1 t=1 t ≥ 2 t \geq 2 t2

剩余项重组:
VLB = E q [ log ⁡ p ( x T ) − log ⁡ q ( x T ∣ x 0 ) + ∑ t = 1 T log ⁡ p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] \begin{align*} \text{VLB} &= \mathbb{E}_{q} \Bigg[ \log p(\mathbf{x}_T) - \log q(\mathbf{x}_T | \mathbf{x}_0) \\ &+ \sum_{t=1}^T \log \frac{p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t)}{q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0)} \Bigg] \end{align*} VLB=Eq[logp(xT)logq(xTx0)+t=1Tlogq(xt1xt,x0)pθ(xt1xt)]
显式分离 ( t=1 ) 项:
∑ t = 1 T log ⁡ p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) = log ⁡ p θ ( x 0 ∣ x 1 ) q ( x 0 ∣ x 1 , x 0 ) ( 因  q ( x 0 ∣ x 1 , x 0 ) = 1 ) + ∑ t = 2 T log ⁡ p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) \begin{align*} \sum_{t=1}^T \log \frac{p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t)}{q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0)} &= \log \frac{p_\theta(\mathbf{x}_0 | \mathbf{x}_1)}{\cancel{q(\mathbf{x}_0 | \mathbf{x}_1, \mathbf{x}_0)}} \quad (\text{因 } q(\mathbf{x}_0 | \mathbf{x}_1, \mathbf{x}_0)=1) \\ &+ \sum_{t=2}^T \log \frac{p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t)}{q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0)} \end{align*} t=1Tlogq(xt1xt,x0)pθ(xt1xt)=logq(x0x1,x0) pθ(x0x1)( q(x0x1,x0)=1)+t=2Tlogq(xt1xt,x0)pθ(xt1xt)


步骤 7: 转换为 KL 散度

合并所有项:
VLB = E q [ log ⁡ p θ ( x 0 ∣ x 1 ) + ∑ t = 2 T log ⁡ p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) + log ⁡ p ( x T ) q ( x T ∣ x 0 ) ] \begin{align*} \text{VLB} &= \mathbb{E}_{q} \Bigg[ \log p_\theta(\mathbf{x}_0 | \mathbf{x}_1) + \sum_{t=2}^T \log \frac{p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t)}{q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0)} \\ &+ \log \frac{p(\mathbf{x}_T)}{q(\mathbf{x}_T | \mathbf{x}_0)} \Bigg] \end{align*} VLB=Eq[logpθ(x0x1)+t=2Tlogq(xt1xt,x0)pθ(xt1xt)+logq(xTx0)p(xT)]
利用 KL 散度定义:
log ⁡ p ( x T ) q ( x T ∣ x 0 ) = − log ⁡ q ( x T ∣ x 0 ) p ( x T ) = − D KL ( q ( x T ∣ x 0 ) ∥ p ( x T ) ) \log \frac{p(\mathbf{x}_T)}{q(\mathbf{x}_T | \mathbf{x}_0)} = - \log \frac{q(\mathbf{x}_T | \mathbf{x}_0)}{p(\mathbf{x}_T)} = -D_{\text{KL}} \Big( q(\mathbf{x}_T | \mathbf{x}_0) \parallel p(\mathbf{x}_T) \Big) logq(xTx0)p(xT)=logp(xT)q(xTx0)=DKL(q(xTx0)p(xT))
和:
E q [ log ⁡ p θ ( x t − 1 ∣ x t ) q ( x t − 1 ∣ x t , x 0 ) ] = − E q [ D KL ( q ( x t − 1 ∣ x t , x 0 ) ∥ p θ ( x t − 1 ∣ x t ) ) ] \mathbb{E}_{q} \left[ \log \frac{p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t)}{q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0)} \right] = - \mathbb{E}_{q} \left[ D_{\text{KL}} \Big( q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t) \Big) \right] Eq[logq(xt1xt,x0)pθ(xt1xt)]=Eq[DKL(q(xt1xt,x0)pθ(xt1xt))]


步骤 8: 最终简化形式

代入得:
VLB = E q ( x 1 ∣ x 0 ) [ log ⁡ p θ ( x 0 ∣ x 1 ) ] − ∑ t = 2 T E q ( x t ∣ x 0 ) [ D KL ( q ( x t − 1 ∣ x t , x 0 ) ∥ p θ ( x t − 1 ∣ x t ) ) ] − D KL ( q ( x T ∣ x 0 ) ∥ p ( x T ) ) \boxed{ \begin{align*} \text{VLB} = & \;\mathbb{E}_{q(\mathbf{x}_1 | \mathbf{x}_0)} \Big[ \log p_\theta(\mathbf{x}_0 | \mathbf{x}_1) \Big] \\ & - \sum_{t=2}^T \mathbb{E}_{q(\mathbf{x}_t | \mathbf{x}_0)} \left[ D_{\text{KL}} \Big( q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) \parallel p_\theta(\mathbf{x}_{t-1} | \mathbf{x}_t) \Big) \right] \\ & - D_{\text{KL}} \Big( q(\mathbf{x}_T | \mathbf{x}_0) \parallel p(\mathbf{x}_T) \Big) \end{align*} } VLB=Eq(x1x0)[logpθ(x0x1)]t=2TEq(xtx0)[DKL(q(xt1xt,x0)pθ(xt1xt))]DKL(q(xTx0)p(xT))


关键说明

  1. 期望的简化

    • 第一项仅依赖 x 1 \mathbf{x}_1 x1 E q ( x 1 ∣ x 0 ) [ ⋅ ] \mathbb{E}_{q(\mathbf{x}_1 | \mathbf{x}_0)} [\cdot] Eq(x1x0)[]
    • 第二项依赖 x t \mathbf{x}_t xt E q ( x t ∣ x 0 ) [ ⋅ ] \mathbb{E}_{q(\mathbf{x}_t | \mathbf{x}_0)} [\cdot] Eq(xtx0)[](因 KL 散度仅需 x t \mathbf{x}_t xt x 0 \mathbf{x}_0 x0
    • 第三项是解析可解的标量
  2. 物理意义

    • 重构项 E [ log ⁡ p θ ( x 0 ∣ x 1 ) ] \mathbb{E} [\log p_\theta(\mathbf{x}_0 | \mathbf{x}_1)] E[logpθ(x0x1)] 衡量生成数据的能力
    • 去噪匹配项:KL 散度迫使生成过程匹配扩散过程的反向后验
    • 先验匹配项:确保最终分布 x T \mathbf{x}_T xT 接近标准高斯先验
  3. 与噪声预测的联系
    通过之前推导的闭式解:
    D KL ( q ( ⋅ ) ∥ p θ ( ⋅ ) ) ∝ ∥ ϵ − ϵ θ ( x t , t ) ∥ 2 D_{\text{KL}} \Big( q(\cdot) \parallel p_\theta(\cdot) \Big) \propto \| \boldsymbol{\epsilon} - \boldsymbol{\epsilon}_\theta (\mathbf{x}_t, t) \|^2 DKL(q()pθ())ϵϵθ(xt,t)2
    最终目标简化为 噪声预测的均方误差

详细推导:后验分布 q ( x t − 1 ∣ x t , x 0 ) q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) q(xt1xt,x0) 和参数化均值 μ θ ( x t , t ) \boldsymbol{\mu}_\theta(\mathbf{x}_t, t) μθ(xt,t)


1. 后验分布 q ( x t − 1 ∣ x t , x 0 ) q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) q(xt1xt,x0) 的闭式解

由贝叶斯公式和扩散过程的马尔可夫性质:
q ( x t − 1 ∣ x t , x 0 ) = q ( x t ∣ x t − 1 , x 0 ) ⋅ q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) = q ( x t ∣ x t − 1 ) ⋅ q ( x t − 1 ∣ x 0 ) q ( x t ∣ x 0 ) q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) = \frac{q(\mathbf{x}_t | \mathbf{x}_{t-1}, \mathbf{x}_0) \cdot q(\mathbf{x}_{t-1} | \mathbf{x}_0)}{q(\mathbf{x}_t | \mathbf{x}_0)} = \frac{q(\mathbf{x}_t | \mathbf{x}_{t-1}) \cdot q(\mathbf{x}_{t-1} | \mathbf{x}_0)}{q(\mathbf{x}_t | \mathbf{x}_0)} q(xt1xt,x0)=q(xtx0)q(xtxt1,x0)q(xt1x0)=q(xtx0)q(xtxt1)q(xt1x0)
其中:

  • q ( x t ∣ x t − 1 ) = N ( x t ; α t x t − 1 , β t I ) q(\mathbf{x}_t | \mathbf{x}_{t-1}) = \mathcal{N}(\mathbf{x}_t; \sqrt{\alpha_t} \mathbf{x}_{t-1}, \beta_t \mathbf{I}) q(xtxt1)=N(xt;αt xt1,βtI)
  • q ( x t − 1 ∣ x 0 ) = N ( x t − 1 ; α ˉ t − 1 x 0 , ( 1 − α ˉ t − 1 ) I ) q(\mathbf{x}_{t-1} | \mathbf{x}_0) = \mathcal{N}(\mathbf{x}_{t-1}; \sqrt{\bar{\alpha}_{t-1}} \mathbf{x}_0, (1 - \bar{\alpha}_{t-1}) \mathbf{I}) q(xt1x0)=N(xt1;αˉt1 x0,(1αˉt1)I)
  • q ( x t ∣ x 0 ) = N ( x t ; α ˉ t x 0 , ( 1 − α ˉ t ) I ) q(\mathbf{x}_t | \mathbf{x}_0) = \mathcal{N}(\mathbf{x}_t; \sqrt{\bar{\alpha}_t} \mathbf{x}_0, (1 - \bar{\alpha}_t) \mathbf{I}) q(xtx0)=N(xt;αˉt x0,(1αˉt)I)

定义:
α t = 1 − β t , α ˉ t = ∏ i = 1 t α i \alpha_t = 1 - \beta_t, \quad \bar{\alpha}_t = \prod_{i=1}^t \alpha_i αt=1βt,αˉt=i=1tαi

推导均值 μ ~ t \tilde{\boldsymbol{\mu}}_t μ~t
通过高斯分布指数项匹配(忽略常数项):
− 1 2 [ ∥ x t − α t x t − 1 ∥ 2 β t + ∥ x t − 1 − α ˉ t − 1 x 0 ∥ 2 1 − α ˉ t − 1 ] + C -\frac{1}{2} \left[ \frac{\|\mathbf{x}_t - \sqrt{\alpha_t} \mathbf{x}_{t-1}\|^2}{\beta_t} + \frac{\|\mathbf{x}_{t-1} - \sqrt{\bar{\alpha}_{t-1}} \mathbf{x}_0\|^2}{1 - \bar{\alpha}_{t-1}} \right] + C 21[βtxtαt xt12+1αˉt1xt1αˉt1 x02]+C
提取 x t − 1 \mathbf{x}_{t-1} xt1 的二次项和一次项系数:

  1. 二次项系数
    A = α t β t + 1 1 − α ˉ t − 1 = α t ( 1 − α ˉ t − 1 ) + β t β t ( 1 − α ˉ t − 1 ) = 1 − α ˉ t β t ( 1 − α ˉ t − 1 ) A = \frac{\alpha_t}{\beta_t} + \frac{1}{1 - \bar{\alpha}_{t-1}} = \frac{\alpha_t(1 - \bar{\alpha}_{t-1}) + \beta_t}{\beta_t (1 - \bar{\alpha}_{t-1})} = \frac{1 - \bar{\alpha}_t}{\beta_t (1 - \bar{\alpha}_{t-1})} A=βtαt+1αˉt11=βt(1αˉt1)αt(1αˉt1)+βt=βt(1αˉt1)1αˉt
    (因 α t α ˉ t − 1 = α ˉ t \alpha_t \bar{\alpha}_{t-1} = \bar{\alpha}_t αtαˉt1=αˉt β t = 1 − α t \beta_t = 1 - \alpha_t βt=1αt)
  2. 一次项系数
    b = α t β t x t + α ˉ t − 1 1 − α ˉ t − 1 x 0 \mathbf{b} = \frac{\sqrt{\alpha_t}}{\beta_t} \mathbf{x}_t + \frac{\sqrt{\bar{\alpha}_{t-1}}}{1 - \bar{\alpha}_{t-1}} \mathbf{x}_0 b=βtαt xt+1αˉt1αˉt1 x0
    高斯分布均值满足 μ ~ t = A − 1 b \tilde{\boldsymbol{\mu}}_t = A^{-1} \mathbf{b} μ~t=A1b,代入得:
    μ ~ t = β t ( 1 − α ˉ t − 1 ) 1 − α ˉ t ( α t β t x t + α ˉ t − 1 1 − α ˉ t − 1 x 0 ) = α t ( 1 − α ˉ t − 1 ) 1 − α ˉ t x t + α ˉ t − 1 β t 1 − α ˉ t x 0 \tilde{\boldsymbol{\mu}}_t = \frac{\beta_t (1 - \bar{\alpha}_{t-1})}{1 - \bar{\alpha}_t} \left( \frac{\sqrt{\alpha_t}}{\beta_t} \mathbf{x}_t + \frac{\sqrt{\bar{\alpha}_{t-1}}}{1 - \bar{\alpha}_{t-1}} \mathbf{x}_0 \right) = \frac{\sqrt{\alpha_t} (1 - \bar{\alpha}_{t-1})}{1 - \bar{\alpha}_t} \mathbf{x}_t + \frac{\sqrt{\bar{\alpha}_{t-1}} \beta_t}{1 - \bar{\alpha}_t} \mathbf{x}_0 μ~t=1αˉtβt(1αˉt1)(βtαt xt+1αˉt1αˉt1 x0)=1αˉtαt (1αˉt1)xt+1αˉtαˉt1 βtx0

推导方差 β ~ t \tilde{\beta}_t β~t
协方差矩阵为 A − 1 A^{-1} A1:
β ~ t = β t ( 1 − α ˉ t − 1 ) 1 − α ˉ t \tilde{\beta}_t = \frac{\beta_t (1 - \bar{\alpha}_{t-1})}{1 - \bar{\alpha}_t} β~t=1αˉtβt(1αˉt1)

最终闭式解
q ( x t − 1 ∣ x t , x 0 ) = N ( x t − 1 ; α ˉ t − 1 β t 1 − α ˉ t x 0 + α t ( 1 − α ˉ t − 1 ) 1 − α ˉ t x t ⏟ μ ~ t ( x t , x 0 ) , β t ( 1 − α ˉ t − 1 ) 1 − α ˉ t I ⏟ β ~ t ) \boxed{q(\mathbf{x}_{t-1} | \mathbf{x}_t, \mathbf{x}_0) = \mathcal{N}\left( \mathbf{x}_{t-1}; \underbrace{\frac{\sqrt{\bar{\alpha}_{t-1}} \beta_t}{1 - \bar{\alpha}_t} \mathbf{x}_0 + \frac{\sqrt{\alpha_t} (1 - \bar{\alpha}_{t-1})}{1 - \bar{\alpha}_t} \mathbf{x}_t}_{\tilde{\boldsymbol{\mu}}_t(\mathbf{x}_t, \mathbf{x}_0)}, \underbrace{\frac{\beta_t (1 - \bar{\alpha}_{t-1})}{1 - \bar{\alpha}_t} \mathbf{I}}_{\tilde{\beta}_t} \right)} q(xt1xt,x0)=N xt1;μ~t(xt,x0) 1αˉtαˉt1 βtx0+1αˉtαt (1αˉt1)xt,β~t 1αˉtβt(1αˉt1)I


2. 参数化均值 μ θ ( x t , t ) \boldsymbol{\mu}_\theta(\mathbf{x}_t, t) μθ(xt,t) 的推导

由前向过程:
x t = α ˉ t x 0 + 1 − α ˉ t ϵ , ϵ ∼ N ( 0 , I ) \mathbf{x}_t = \sqrt{\bar{\alpha}_t} \mathbf{x}_0 + \sqrt{1 - \bar{\alpha}_t} \boldsymbol{\epsilon}, \quad \boldsymbol{\epsilon} \sim \mathcal{N}(\mathbf{0}, \mathbf{I}) xt=αˉt x0+1αˉt ϵ,ϵN(0,I)
解得 x 0 \mathbf{x}_0 x0:
x 0 = x t − 1 − α ˉ t ϵ α ˉ t \mathbf{x}_0 = \frac{\mathbf{x}_t - \sqrt{1 - \bar{\alpha}_t} \boldsymbol{\epsilon}}{\sqrt{\bar{\alpha}_t}} x0=αˉt xt1αˉt ϵ
代入后验均值 μ ~ t \tilde{\boldsymbol{\mu}}_t μ~t:
μ ~ t = α ˉ t − 1 β t 1 − α ˉ t ( x t − 1 − α ˉ t ϵ α ˉ t ) + α t ( 1 − α ˉ t − 1 ) 1 − α ˉ t x t \tilde{\boldsymbol{\mu}}_t = \frac{\sqrt{\bar{\alpha}_{t-1}} \beta_t}{1 - \bar{\alpha}_t} \left( \frac{\mathbf{x}_t - \sqrt{1 - \bar{\alpha}_t} \boldsymbol{\epsilon}}{\sqrt{\bar{\alpha}_t}} \right) + \frac{\sqrt{\alpha_t} (1 - \bar{\alpha}_{t-1})}{1 - \bar{\alpha}_t} \mathbf{x}_t μ~t=1αˉtαˉt1 βt(αˉt xt1αˉt ϵ)+1αˉtαt (1αˉt1)xt
合并同类项:
μ ~ t = β t α ˉ t − 1 α ˉ t ( 1 − α ˉ t ) x t − β t α ˉ t − 1 1 − α ˉ t α ˉ t ( 1 − α ˉ t ) ϵ + α t ( 1 − α ˉ t − 1 ) 1 − α ˉ t x t \tilde{\boldsymbol{\mu}}_t = \frac{\beta_t \sqrt{\bar{\alpha}_{t-1}}}{\sqrt{\bar{\alpha}_t} (1 - \bar{\alpha}_t)} \mathbf{x}_t - \frac{\beta_t \sqrt{\bar{\alpha}_{t-1}} \sqrt{1 - \bar{\alpha}_t}}{\sqrt{\bar{\alpha}_t} (1 - \bar{\alpha}_t)} \boldsymbol{\epsilon} + \frac{\sqrt{\alpha_t} (1 - \bar{\alpha}_{t-1})}{1 - \bar{\alpha}_t} \mathbf{x}_t μ~t=αˉt (1αˉt)βtαˉt1 xtαˉt (1αˉt)βtαˉt1 1αˉt ϵ+1αˉtαt (1αˉt1)xt
利用 α ˉ t = α t α ˉ t − 1 \bar{\alpha}_t = \alpha_t \bar{\alpha}_{t-1} αˉt=αtαˉt1 α ˉ t = α t α ˉ t − 1 \sqrt{\bar{\alpha}_t} = \sqrt{\alpha_t} \sqrt{\bar{\alpha}_{t-1}} αˉt =αt αˉt1 化简:

  1. x t \mathbf{x}_t xt 的系数:
    β t α t ( 1 − α ˉ t ) + α t ( 1 − α ˉ t − 1 ) 1 − α ˉ t = 1 α t (详细化简见备注) \frac{\beta_t}{\alpha_t (1 - \bar{\alpha}_t)} + \frac{\sqrt{\alpha_t} (1 - \bar{\alpha}_{t-1})}{1 - \bar{\alpha}_t} = \frac{1}{\sqrt{\alpha_t}} \quad \text{(详细化简见备注)} αt(1αˉt)βt+1αˉtαt (1αˉt1)=αt 1(详细化简见备注)
  2. ϵ \boldsymbol{\epsilon} ϵ 的系数:
    − β t α t 1 − α ˉ t -\frac{\beta_t}{\sqrt{\alpha_t} \sqrt{1 - \bar{\alpha}_t}} αt 1αˉt βt
    最终:
    μ ~ t = 1 α t x t − β t α t 1 − α ˉ t ϵ = 1 α t ( x t − β t 1 − α ˉ t ϵ ) \tilde{\boldsymbol{\mu}}_t = \frac{1}{\sqrt{\alpha_t}} \mathbf{x}_t - \frac{\beta_t}{\sqrt{\alpha_t} \sqrt{1 - \bar{\alpha}_t}} \boldsymbol{\epsilon} = \frac{1}{\sqrt{\alpha_t}} \left( \mathbf{x}_t - \frac{\beta_t}{\sqrt{1 - \bar{\alpha}_t}} \boldsymbol{\epsilon} \right) μ~t=αt 1xtαt 1αˉt βtϵ=αt 1(xt1αˉt βtϵ)
    参数化策略
    用神经网络 ϵ θ ( x t , t ) \boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t) ϵθ(xt,t) 预测噪声 ϵ \boldsymbol{\epsilon} ϵ
    μ θ ( x t , t ) = 1 α t ( x t − β t 1 − α ˉ t ϵ θ ( x t , t ) ) \boxed{\boldsymbol{\mu}_\theta(\mathbf{x}_t, t) = \frac{1}{\sqrt{\alpha_t}} \left( \mathbf{x}_t - \frac{\beta_t}{\sqrt{1 - \bar{\alpha}_t}} \boldsymbol{\epsilon}_\theta(\mathbf{x}_t, t) \right)} μθ(xt,t)=αt 1(xt1αˉt βtϵθ(xt,t))

关键备注

  1. x t \mathbf{x}_t xt 系数的化简:
    β t α t ( 1 − α ˉ t ) + α t ( 1 − α ˉ t − 1 ) 1 − α ˉ t = β t + α t α t ( 1 − α ˉ t − 1 ) α t α t ( 1 − α ˉ t ) (通分) = ( 1 − α t ) + α t 3 / 2 − α t 3 / 2 α ˉ t − 1 α t 3 / 2 ( 1 − α ˉ t ) = 1 − α t + α t 3 / 2 ( 1 − α ˉ t − 1 ) α t 3 / 2 ( 1 − α ˉ t ) = 1 α t (分子 =  α t 3 / 2 时成立,需结合  α ˉ t = α t α ˉ t − 1 验证) \begin{align*} &\frac{\beta_t}{\alpha_t (1 - \bar{\alpha}_t)} + \frac{\sqrt{\alpha_t} (1 - \bar{\alpha}_{t-1})}{1 - \bar{\alpha}_t} \\ =& \frac{\beta_t + \alpha_t \sqrt{\alpha_t} (1 - \bar{\alpha}_{t-1})}{\alpha_t \sqrt{\alpha_t} (1 - \bar{\alpha}_t)} \quad \text{(通分)} \\ =& \frac{(1 - \alpha_t) + \alpha_t^{3/2} - \alpha_t^{3/2} \bar{\alpha}_{t-1}}{\alpha_t^{3/2} (1 - \bar{\alpha}_t)} \\ =& \frac{1 - \alpha_t + \alpha_t^{3/2} (1 - \bar{\alpha}_{t-1})}{\alpha_t^{3/2} (1 - \bar{\alpha}_t)} \\ =& \frac{1}{\sqrt{\alpha_t}} \quad \text{(分子 = \(\alpha_t^{3/2}\) 时成立,需结合 \(\bar{\alpha}_t = \alpha_t \bar{\alpha}_{t-1}\) 验证)} \end{align*} ====αt(1αˉt)βt+1αˉtαt (1αˉt1)αtαt (1αˉt)βt+αtαt (1αˉt1)(通分)αt3/2(1αˉt)(1αt)+αt3/2αt3/2αˉt1αt3/2(1αˉt)1αt+αt3/2(1αˉt1)αt 1(分子 = αt3/2 时成立,需结合 αˉt=αtαˉt1 验证)

  2. 物理意义

    • 后验均值 μ ~ t \tilde{\boldsymbol{\mu}}_t μ~t x 0 \mathbf{x}_0 x0 x t \mathbf{x}_t xt 的线性组合,显式依赖于初始数据 x 0 \mathbf{x}_0 x0
    • 参数化均值 μ θ \boldsymbol{\mu}_\theta μθ 用噪声预测网络 ϵ θ \boldsymbol{\epsilon}_\theta ϵθ 隐式消除对 x 0 \mathbf{x}_0 x0 的依赖,实现可计算的反向过程。

相关文章:

  • 【Java开发日记】说一说 SpringBoot 中 CommandLineRunner
  • 【强连通分量 缩点 最长路 拓扑排序】P2656 采蘑菇|普及+
  • 游戏常用运行库合集 | GRLPackage 游戏运行库!
  • 机器学习期末复习
  • Dynamics 365 Finance + Power Automate 自动化凭证审核
  • day029-Shell自动化编程-计算与while循环
  • SOC-ESP32S3部分:33-声学前端模型ESP-SR
  • ViiTor实时翻译 2.4.2 | 完全免费的同声传译软件 实测识别率非常高 可以识别视频生成字幕
  • [GitHub] 优秀开源项目
  • vue3 创建图标 按钮
  • 26N60-ASEMI工业电机控制专用26N60
  • string类型
  • 《探秘局域网广播:网络世界的 “大喇叭”》
  • 【AI学习】wirelessGPT多任务无线基础模型摘要
  • GPT-5:不止于回答,AI学会了“思考”
  • Gartner《How to Create and Maintain a Knowledge Base forHumans and AI》学习报告
  • Linux虚拟化技术:从KVM到容器的轻量化革命
  • (每日一道算法题)二叉树剪枝
  • JDK17 Http Request 异步处理 源码刨析
  • nginx日志的一点理解
  • 关于百度网站的优缺点/南京seo公司教程
  • wordpress后台500错误/seo排名优化首页
  • 3d云打印网站开发/商业软文怎么写
  • 沈阳网站外包公司/十种营销方式
  • 网站系统渗透测试报告/重庆seo论坛
  • 网站关键字个数/广州网站设计制作