当前位置：首页 > news >正文

深度学习基本模块：GRU 门控循环单元

news 2025/9/20 9:36:48

门控循环单元（Gated Recurrent Unit，GRU）是一种特殊的循环神经网络，由Cho等人于2014年提出。GRU通过简化LSTM的结构，减少了参数数量，同时在许多序列建模任务中保持了与LSTM相当的性能。
深度学习基本模块：LSTM 长短期记忆网络

GRU通过以下两种主要方式简化LSTM结构，减少参数数量：

门控数量减少
- LSTM：3个门（遗忘门、输入门、输出门）+ 1个候选细胞状态 = 4组参数
- GRU：2个门（重置门、更新门）+ 1个候选隐藏状态 = 3组参数
取消细胞状态
- LSTM：需要维护独立的细胞状态 $C_t$ 和隐藏状态 $h_t$
- GRU：只有隐藏状态 $h_t$ ，取消了独立的细胞状态

简化示意图
LSTM: [输入] → (遗忘门 → 输入门 → 输出门 → 候选状态) → [输出]
GRU:  [输入] → (重置门 → 更新门 → 候选状态) → [输出]

一、GRU介绍

1.1 结构

输入层：序列数据，形状为(batch_size, seq_len, input_size)的张量（与RNN、LSTM相同）
GRU层：
- 核心组件：
  - 隐藏状态： $h_t$ ，形状：(batch_size, hidden_size)，作为当前时间步的输出，并传递到下一个时间步
  - 门控机制：控制信息的流动，包括：
    - 重置门：决定如何将过去信息与当前输入结合
    - 更新门：决定保留多少旧信息，添加多少新信息
- 可学习参数：
  - 权重矩阵： $W_z$ , $W_r$ , $W$ （更新门、重置门和候选隐藏状态各有一个权重矩阵），形状均为(hidden_size, input_size + hidden_size)
  - 偏置项： $b_z$ , $b_r$ , $b$ ，形状均为(hidden_size,)
- 可学习参数（PyTorch实现）：在PyTorch中，为了计算效率，这些参数被组合成更大的矩阵
  - 输入到隐藏的权重： $weight_ih_l0$ ，形状为(3 * hidden_size, input_size)
  - 隐藏到隐藏的权重： $weight_hh_l0$ ，形状为(3 * hidden_size, hidden_size)
  - 输入到隐藏的偏置： $bias_ih_l0$ ，形状为(3 * hidden_size)
  - 隐藏到隐藏的偏置： $bias_hh_l0$ ，形状为(3 * hidden_size)
激活函数：
- 门控单元（更新门、重置门）：使用Sigmoid激活函数 $σ\sigma$ ；输出范围：[0, 1]；模拟"开关"机制
- 候选隐藏状态：使用Tanh激活函数；输出范围：[-1, 1]；生成新的候选值
门控机制的意义
- 重置门：控制过去信息对当前候选状态的影响程度
  - 值接近1：保留大部分过去信息
  - 值接近0：丢弃大部分过去信息
- 更新门：平衡旧信息和新信息的重要性
  - 值接近1：保留大部分旧隐藏状态
  - 值接近0：使用大部分新候选隐藏状态

1.2 参数

input_size：每个时间步输入的特征数量，对于音频频谱，通常是频率维度（如梅尔频带数）；对于文本处理，通常是词向量的维度。这个参数决定了输入层的大小。
hidden_size：隐藏状态 $h_t$ 的维度，决定GRU的记忆容量和表征能力，较大的hidden_size：可以存储更多信息，表征能力更强，但会增加计算量和过拟合风险；较小的hidden_size：计算效率高，但可能限制模型表达能力。
num_layers：堆叠的GRU层数，增加层数提高模型抽象能力，可以学习更复杂的特征层次；过多层数增加训练难度，可能导致梯度消失/爆炸，增加过拟合风险。
bias：是否使用偏置项，默认为True。
batch_first：输入输出维度顺序，默认为False。True：(batch_size, seq_len, input_size/hidden_size)；False：(seq_len, batch_size, input_size/hidden_size)
dropout：在多层GRU中应用dropout防止过拟合的概率，仅在num_layers > 1时生效，在最后一层GRU之后不应用dropout。
bidirectional：是否使用双向GRU，True：同时处理前向和后向序列信息，捕获更丰富的上下文特征，适合需要全局上下文信息的任务（如文本分类、命名实体识别）；False：只处理前向序列信息，适合因果性任务（如时间序列预测、实时语音识别）。

1.3 输入输出维度

输入数据维度：(batch_size, seq_len, input_size)（当batch_first=True时）
输出序列维度：(batch_size, seq_len, hidden_size * num_directions)（当batch_first=True时）
最终隐藏状态：(num_layers * num_directions, batch_size, hidden_size)

1.4 计算过程

计算重置门：
$rt=σ(Wr⋅[ht−1,xt]+br)r_t = \sigma(W_r \cdot [h_{t-1}, x_t] + b_r)$
- $r_t$ ：重置门的输出，决定如何组合过去信息和当前输入，值范围 [0, 1]
- $W_r$ ：重置门的权重矩阵，形状为(hidden_size, input_size + hidden_size)
- $b_r$ ：重置门的偏置项,形状为(hidden_size,)
计算更新门：
$zt=σ(Wz⋅[ht−1,xt]+bz)z_t = \sigma(W_z \cdot [h_{t-1}, x_t] + b_z)$
- $z_t$ ：更新门的输出，决定保留多少旧信息，值范围 [0, 1]
- $W_z$ ：更新门的权重矩阵，形状为(hidden_size, input_size + hidden_size)
- $b_z$ ：更新门的偏置项，形状为(hidden_size,)
计算候选隐藏状态：
$h~t=tanh⁡(W⋅[rt⊙ht−1,xt]+b)\tilde{h}_t = \tanh(W \cdot [r_t \odot h_{t-1}, x_t] + b)$
- $h~t\tilde{h}_t$ ：候选隐藏状态，包含可能的新信息，值范围 [-1, 1]
- $W$ ：候选隐藏状态的权重矩阵，形状为(hidden_size, input_size + hidden_size)
- $b$ ：候选隐藏状态的偏置项，形状为(hidden_size,)
- $⊙\odot$ ：逐元素乘法
更新隐藏状态：
$ht=(1−zt)⊙ht−1+zt⊙h~th_t = (1 - z_t) \odot h_{t-1} + z_t \odot \tilde{h}_t$
- $h_t$ ：当前时间步的隐藏状态，作为输出传递到下一个时间步
- $z_t) \odot h_{t-1}$ ：保留的旧信息部分
- $zt⊙h~tz_t \odot \tilde{h}_t$ ：添加的新信息部分

GRU计算过程详解与数值示例
假设参数：
• input_size = 2（每个时间步输入2个特征）
• hidden_size = 3（隐藏状态有3个维度）
初始状态：
• h_prev = [0, 0, 0]    # 上一时间步的隐藏状态
• x_t = [0.5, -0.3]     # 当前时间步的输入
权重矩阵和偏置项：
# 重置门参数
W_r = [[0.1, 0.2, 0.3, 0.4, 0.5],[0.6, 0.7, 0.8, 0.9, 1.0],[1.1, 1.2, 1.3, 1.4, 1.5]]
b_r = [0.1, 0.2, 0.3]
# 更新门参数
W_z = [[-0.1, -0.2, -0.3, -0.4, -0.5],[-0.6, -0.7, -0.8, -0.9, -1.0],[-1.1, -1.2, -1.3, -1.4, -1.5]]
b_z = [0.1, 0.2, 0.3]
# 候选隐藏状态参数
W = [[0.2, 0.3, 0.4, 0.5, 0.6],[0.7, 0.8, 0.9, 1.0, 1.1],[1.2, 1.3, 1.4, 1.5, 1.6]]
b = [0.1, 0.2, 0.3]1. 计算重置门 (Reset Gate)
目的：决定如何将过去信息与当前输入结合  
计算过程：
# 拼接输入 [h_prev, x_t] = [0, 0, 0, 0.5, -0.3]
concat = [0, 0, 0, 0.5, -0.3]
# 计算重置门：
r_t = [σ(0.1*0 + 0.2*0 + 0.3*0 + 0.4*0.5 + 0.5*(-0.3) + 0.1) = σ(0.0 + 0.0 + 0.0 + 0.2 - 0.15 + 0.1) = σ(0.15) ≈ 0.537,σ(0.6*极 + 0.7*0 + 0.8*0 + 0.9*0.5 + 1.0*(-0.3) + 0.2) = σ(0.0 + 0.0 + 0.0 + 0.45 - 0.3 + 0.极) = σ(0.35) ≈ 0.586,σ(1.1*0 + 1.2*0 + 1.3*0 + 1.4*0.5 + 1.5*(-0.3) + 0.3) = σ(0.0 + 0.0 + 0.0 + 0.7 - 0.45 + 0.3) = σ(0.55) ≈ 0.634
]
结果：r_t≈[0.537, 0.586, 0.634]2. 计算更新门 (Update Gate)
目的：决定保留多少旧信息，添加多少新信息  
计算过程：
# 使用相同的拼接向量 [0, 0, 0, 0.5, -0.3]
z_t = [σ(-0.1*0 -0.2*0 -0.3*0 -0.4*0.5 -0.5*(-0.3) + 0.1) = σ(0.0 + 0.0 + 0.0 -0.2 + 0.15 + 0.1) = σ(0.05) ≈ 0.512,σ(-0.6*0 -0.7*0 -0.8*0 -0.9*0.5 -1.0*(-0.3) + 0.2) = σ(0.0 + 0.0 + 0.0 -0.45 + 0.3 + 0.2) = σ(0.05) ≈ 0.512,σ(-1.1*0 -1.2*0 -1.3*0 -1.4*0.5 -1.5*(-0.3) + 0.3) = σ(0.0 + 0.0 + 0.0 -0.7 + 0.45 + 0.3) = σ(0.05) ≈ 0.512
]
结果：z_t≈[0.512, 0.512, 0.512]3. 计算候选隐藏状态 (Candidate Hidden State)
目的：生成可能的新信息   
计算过程：
# 首先计算 r_t ⊙ h_prev = [0.537*0, 0.586*0, 0.634*0] = [0, 0, 0]
# 然后拼接 [r_t ⊙ h_prev, x_t] = [0, 0, 0, 0.5, -0.3]
# 计算候选隐藏状态：
h̃_t = [tanh(0.2*0 + 0.3*0 + 0.4*0 + 0.5*0.5 + 0.6*(-0.3) + 0.1) = tanh(0.0 + 0.0 + 0.0 + 0.25 - 0.18 + 0.1) = tanh(0.17) ≈ 极.168,tanh(0.7*0 + 0.8*0 + 0.9*0 + 1.0*0.5 + 1.1*(-0.3) + 0.2) =tanh(0.0 + 0.0 + 0.0 + 0.5 - 0.33 + 0.2) = tanh(0.37) ≈ 0.354,tanh(1.2*0 + 1.3*0 + 1.4*0 + 1.5*0.5 + 1.6*(-0.3) + 0.3) = tanh(0.0 + 0.0 + 0.0 + 0.75 - 0.48 + 0.3) = tanh(0.57) ≈ 0.515
]
结果：\tilde{h}_t ≈[0.168, 0.354, 0.515]4. 更新隐藏状态 (Update Hidden State)
目的：生成当前时间步的输出  
计算过程：
# 计算 (1 - z_t) = [1-0.512, 1-0.512, 1-0.512] = [0.488, 0.488, 0.488]
# 计算 (1 - z_t) ⊙ h_prev = [0.488*0, 0.488*0, 0.488*0] = [0, 0, 0]
# 计算 z_t ⊙ h̃_t = [0.512*0.168, 0.512*0.354, 0.512*0.515] ≈ [0.086, 0.181, 0.264]
h_t = [0 + 0.086 ≈ 0.086,0 + 0.181 ≈ 0.181,0 + 0.264 ≈ 0.264
]
结果：h_t≈[0.086, 0.181, 0.264]最终结果
经过一个时间步的计算，我们得到：
• 新的隐藏状态：h_t≈[0.086, 0.181, 0.264]
这个隐藏状态将传递到下一个时间步，继续处理序列中的下一个输入。

1.5 计算过程可视化

import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimation
from matplotlib.patches import Rectangle, Circle, Arrow, FancyArrowPatch
import matplotlib as mpl# 设置全局样式
mpl.rcParams['font.size'] = 12
mpl.rcParams['font.family'] = 'DejaVu Sans'# 创建画布
fig, ax = plt.subplots(figsize=(14, 8))
ax.set_xlim(0, 14)
ax.set_ylim(0, 8)
ax.axis('off')
plt.title('GRU Computation Process - Time Step 0', fontsize=16, pad=20)# 颜色定义
input_color = '#FFD700'  # 金色 - 输入
hidden_color = '#1E90FF'  # 道奇蓝 - 隐藏状态
gate_color = '#FF4500'  # 橙红色 - 门控
candidate_color = '#32CD32'  # 酸橙绿 - 候选隐藏状态
active_color = '#FF1493'  # 深粉色 - 激活状态
arrow_color = '#8B0000'  # 深红色 - 连接线# 位置定义
x_pos = 5
y_positions = {'input': 6,'gates': 5,'candidate': 4,'hidden': 3
}# 初始状态
h_init = Circle((1, 4), 0.3, facecolor='lightgray', edgecolor='black')
ax.add_patch(h_init)
ax.text(1, 4, 'h_{-1}', ha='center', va='center', fontsize=10)# 创建节点 - 时间步0
# 输入节点
input_node = Circle((x_pos, y_positions['input']), 0.3, facecolor=input_color, edgecolor='black', alpha=0.7)
ax.add_patch(input_node)
ax.text(x_pos, y_positions['input'], 'x_0', ha='center', va='center', fontsize=10)# 隐藏状态节点
hidden_node = Circle((x_pos, y_positions['hidden']), 0.3, facecolor=hidden_color, edgecolor='black', alpha=0.7)
ax.add_patch(hidden_node)
ax.text(x_pos, y_positions['hidden'], 'h_0', ha='center', va='center', fontsize=10)# 候选隐藏状态节点
candidate_node = Circle((x_pos, y_positions['candidate']), 0.3, facecolor=candidate_color, edgecolor='black', alpha=0.7)
ax.add_patch(candidate_node)
ax.text(x_pos, y_positions['candidate'], 'h̃_0', ha='center', va='center', fontsize=10)# 门控节点
reset_gate = Circle((x_pos - 0.8, y_positions['gates']), 0.2, facecolor=gate_color, edgecolor='black', alpha=0.7)
update_gate = Circle((x_pos + 0.8, y_positions['gates']), 0.2, facecolor=gate_color, edgecolor='black', alpha=0.7)ax.add_patch(reset_gate)
ax.add_patch(update_gate)ax.text(x_pos - 0.8, y_positions['gates'], 'r_t', ha='center', va='center', fontsize=8)
ax.text(x_pos + 0.8, y_positions['gates'], 'z_t', ha='center', va='center', fontsize=8)# 创建时间步1和2的节点（不展示计算过程）
for t in range(1, 3):x_pos_t = 8 + (t - 1) * 3# 输入节点input_node_t = Circle((x_pos_t, y_positions['input']), 0.3, facecolor=input_color, edgecolor='black', alpha=0.3)ax.add_patch(input_node_t)ax.text(x_pos_t, y_positions['input'], f'x_{t}', ha='center', va='center', fontsize=10, alpha=0.3)# 隐藏状态节点hidden_node_t = Circle((x_pos_t, y_positions['hidden']), 0.3, facecolor=hidden_color, edgecolor='black', alpha=0.3)ax.add_patch(hidden_node_t)ax.text(x_pos_t, y_positions['hidden'], f'h_{t}', ha='center', va='center', fontsize=10, alpha=0.3)# 时间步标签ax.text(x_pos_t, y_positions['input'] + 0.8, f'Time Step {t}', ha='center', fontsize=10, alpha=0.3)# 绘制连接线
arrows = []
arrow_labels = []# 初始到当前时间步的连接
arrow = FancyArrowPatch((1.3, 4), (x_pos - 0.3, 4), arrowstyle='->', color='gray', alpha=0.3, mutation_scale=15)
ax.add_patch(arrow)
arrows.append(arrow)# 输入到门控的连接
arrow = FancyArrowPatch((x_pos, y_positions['input'] - 0.3), (x_pos - 0.8, y_positions['gates'] + 0.2),arrowstyle='->', color='gray', alpha=0.3, mutation_scale=15)
ax.add_patch(arrow)
arrows.append(arrow)arrow = FancyArrowPatch((x_pos, y_positions['input'] - 0.3), (x_pos + 0.8, y_positions['gates'] + 0.2),arrowstyle='->', color='gray', alpha=0.3, mutation_scale=15)
ax.add_patch(arrow)
arrows.append(arrow)# 隐藏状态到门控的连接
arrow = FancyArrowPatch((x_pos - 0.8, y_positions['hidden'] + 0.3), (x_pos - 0.8, y_positions['gates'] - 0.2),arrowstyle='->', color='gray', alpha=0.3, mutation_scale=15)
ax.add_patch(arrow)
arrows.append(arrow)arrow = FancyArrowPatch((x_pos + 0.8, y_positions['hidden'] + 0.3), (x_pos + 0.8, y_positions['gates'] - 0.2),arrowstyle='->', color='gray', alpha=0.3, mutation_scale=15)
ax.add_patch(arrow)
arrows.append(arrow)# 重置门到候选隐藏状态的连接
arrow = FancyArrowPatch((x_pos - 0.8, y_positions['gates'] - 0.2), (x_pos - 0.3, y_positions['candidate'] + 0.3),arrowstyle='->', color='gray', alpha=0.3, mutation_scale=15)
ax.add_patch(arrow)
arrows.append(arrow)# 输入到候选隐藏状态的连接
arrow = FancyArrowPatch((x_pos, y_positions['input'] - 0.3), (x_pos, y_positions['candidate'] + 0.3),arrowstyle='->', color='gray', alpha=0.3, mutation_scale=15)
ax.add_patch(arrow)
arrows.append(arrow)# 候选隐藏状态到更新门的连接
arrow = FancyArrowPatch((x_pos, y_positions['candidate'] - 0.3), (x_pos + 0.8, y_positions['gates'] + 0.2),arrowstyle='->', color='gray', alpha=0.3, mutation_scale=15)
ax.add_patch(arrow)
arrows.append(arrow)# 更新门到隐藏状态的连接
arrow = FancyArrowPatch((x_pos + 0.8, y_positions['gates'] - 0.2), (x_pos + 0.3, y_positions['hidden'] + 0.3),arrowstyle='->', color='gray', alpha=0.3, mutation_scale=15)
ax.add_patch(arrow)
arrows.append(arrow)# 隐藏状态到下一个时间步的连接
arrow = FancyArrowPatch((x_pos + 0.3, y_positions['hidden']), (8.7, 4), arrowstyle='->', color='gray', alpha=0.3,mutation_scale=15)
ax.add_patch(arrow)
arrows.append(arrow)# 添加公式
formula_text = ax.text(7, 1, '', fontsize=14, ha='center', bbox=dict(facecolor='white', alpha=0.8))# 添加图例
legend_elements = [Circle((0, 0), radius=0.3, facecolor=input_color, edgecolor='black', label='Input (x_t)'),Circle((0, 0), radius=0.3, facecolor=hidden_color, edgecolor='black', label='Hidden State (h_t)'),Circle((0, 0), radius=0.3, facecolor=candidate_color, edgecolor='black', label='Candidate State (h̃_t)'),Circle((0, 0), radius=0.2, facecolor=gate_color, edgecolor='black', label='Gates'),FancyArrowPatch((0, 0), (0, 0), arrowstyle='->', color=arrow_color, label='Active Connection')
]
ax.legend(handles=legend_elements, loc='upper center', bbox_to_anchor=(0.5, -0.05), ncol=3)# 节点列表
nodes = {'input': input_node,'hidden': hidden_node,'candidate': candidate_node,'reset_gate': reset_gate,'update_gate': update_gate,'h_init': h_init
}# 动画更新函数
def update(frame):# 重置所有节点颜色for node in nodes.values():if node == h_init:node.set_facecolor('lightgray')elif node in [input_node, hidden_node, candidate_node]:if node == input_node:node.set_facecolor(input_color)elif node == hidden_node:node.set_facecolor(hidden_color)elif node == candidate_node:node.set_facecolor(candidate_color)else:node.set_facecolor(gate_color)node.set_alpha(0.7)# 重置所有连接线for arrow in arrows:arrow.set_alpha(0.3)arrow.set_color('gray')# 根据帧数更新if frame == 0:# 初始状态formula_text.set_text('Initialization: $h_{-1} = 0$')nodes['h_init'].set_facecolor(active_color)nodes['h_init'].set_alpha(1.0)elif frame == 1:# 重置门计算formula_text.set_text('1. Reset Gate: $r_t = \\sigma(W_r \\cdot [h_{t-1}, x_t] + b_r)$')nodes['input'].set_facecolor(active_color)nodes['h_init'].set_facecolor(active_color)nodes['reset_gate'].set_facecolor(active_color)nodes['input'].set_alpha(1.0)nodes['h_init'].set_alpha(1.0)nodes['reset_gate'].set_alpha(1.0)# 激活相关连接arrows[0].set_alpha(1.0)  # h_{-1} -> h_0arrows[0].set_color(arrow_color)arrows[1].set_alpha(1.0)  # x_0 -> r_tarrows[1].set_color(arrow_color)arrows[3].set_alpha(1.0)  # h_{-1} -> r_tarrows[3].set_color(arrow_color)elif frame == 2:# 更新门计算formula_text.set_text('2. Update Gate: $z_t = \\sigma(W_z \\cdot [h_{t-1}, x_t] + b_z)$')nodes['input'].set_facecolor(active_color)nodes['h_init'].set_facecolor(active_color)nodes['update_gate'].set_facecolor(active_color)nodes['input'].set_alpha(1.0)nodes['h_init'].set_alpha(1.0)nodes['update_gate'].set_alpha(1.0)# 激活相关连接arrows[0].set_alpha(1.0)  # h_{-1} -> h_0arrows[0].set_color(arrow_color)arrows[2].set_alpha(1.0)  # x_0 -> z_tarrows[2].set_color(arrow_color)arrows[4].set_alpha(1.0)  # h_{-1} -> z_tarrows[4].set_color(arrow_color)elif frame == 3:# 候选隐藏状态计算formula_text.set_text('3. Candidate State: $\\tilde{h}_t = \\tanh(W \\cdot [r_t \\odot h_{t-1}, x_t] + b)$')nodes['input'].set_facecolor(active_color)nodes['h_init'].set_facecolor(active_color)nodes['reset_gate'].set_facecolor(active_color)nodes['candidate'].set_facecolor(active_color)nodes['input'].set_alpha(1.0)nodes['h_init'].set_alpha(1.0)nodes['reset_gate'].set_alpha(1.0)nodes['candidate'].set_alpha(1.0)# 激活相关连接arrows[0].set_alpha(1.0)  # h_{-1} -> h_0arrows[0].set_color(arrow_color)arrows[1].set_alpha(1.0)  # x_0 -> r_tarrows[1].set_color(arrow_color)arrows[3].set_alpha(1.0)  # h_{-1} -> r_tarrows[3].set_color(arrow_color)arrows[5].set_alpha(1.0)  # r_t -> h̃_tarrows[5].set_color(arrow_color)arrows[6].set_alpha(1.0)  # x_0 -> h̃_tarrows[6].set_color(arrow_color)elif frame == 4:# 更新隐藏状态formula_text.set_text('4. Update Hidden State: $h_t = (1 - z_t) \\odot h_{t-1} + z_t \\odot \\tilde{h}_t$')nodes['h_init'].set_facecolor(active_color)nodes['update_gate'].set_facecolor(active_color)nodes['candidate'].set_facecolor(active_color)nodes['hidden'].set_facecolor(active_color)nodes['h_init'].set_alpha(1.0)nodes['update_gate'].set_alpha(1.0)nodes['candidate'].set_alpha(1.0)nodes['hidden'].set_alpha(1.0)# 激活相关连接arrows[0].set_alpha(1.0)  # h_{-1} -> h_0arrows[0].set_color(arrow_color)arrows[4].set_alpha(1.0)  # h_{-1} -> z_tarrows[4].set_color(arrow_color)arrows[7].set_alpha(1.0)  # h̃_t -> z_tarrows[7].set_color(arrow_color)arrows[8].set_alpha(1.0)  # z_t -> h_tarrows[8].set_color(arrow_color)arrows[9].set_alpha(1.0)  # h_t -> nextarrows[9].set_color(arrow_color)return list(nodes.values()) + arrows + [formula_text]# 创建动画
animation = FuncAnimation(fig, update, frames=range(5),interval=1500, blit=True)plt.tight_layout()
animation.save('gru_time_step_0.gif', writer='pillow', fps=1, dpi=100)
plt.show()

在这里插入图片描述

二、代码示例

通过两层GRU处理一段音频频谱，打印每层的输出形状、参数形状，并可视化特征图。

import torch
import torch.nn as nn
import matplotlib.pyplot as plt
import librosa
import numpy as np# 定义GRU模型
class GRUModel(nn.Module):def __init__(self, input_size):super(GRUModel, self).__init__()self.gru1 = nn.GRU(input_size, 100, batch_first=True)self.gru2 = nn.GRU(100, 64, batch_first=True)def forward(self, x):out1, _ = self.gru1(x)  # 第一层GRU输出out2, _ = self.gru2(out1)  # 第二层GRU输出return out1, out2# 读取音频文件并处理
file_path = 'test.wav'  # 替换为您的音频文件路径
waveform, sample_rate = librosa.load(file_path, sr=16000, mono=True)# 选取3秒的数据
start_sample = int(1.5 * sample_rate)
end_sample = int(4.5 * sample_rate)
audio_segment = waveform[start_sample:end_sample]# 转换为频谱
n_fft = 512
hop_length = 256
spectrogram = librosa.stft(audio_segment, n_fft=n_fft, hop_length=hop_length)
spectrogram_db = librosa.amplitude_to_db(np.abs(spectrogram))
spectrogram_tensor = torch.tensor(spectrogram_db, dtype=torch.float32).unsqueeze(0)
spectrogram_tensor = spectrogram_tensor.permute(0, 2, 1)  # (batch, seq_len, input_size)
print(f"Spectrogram tensor shape: {spectrogram_tensor.shape}")# 创建GRU模型实例
input_size = spectrogram_tensor.shape[2]
model = GRUModel(input_size)# 前向传播
gru_output1, gru_output2 = model(spectrogram_tensor)# 打印输出形状
print(f"GRU Layer 1 output shape: {gru_output1.shape}")
print(f"GRU Layer 2 output shape: {gru_output2.shape}")# 打印每层的参数形状
print("\nGRU Layer 1 parameters:")
print(f"Input to hidden weights shape: {model.gru1.weight_ih_l0.shape}")
print(f"Hidden to hidden weights shape: {model.gru1.weight_hh_l0.shape}")
print(f"Input to hidden bias shape: {model.gru1.bias_ih_l0.shape}")
print(f"Hidden to hidden bias shape: {model.gru1.bias_hh_l0.shape}")print("\nGRU Layer 2 parameters:")
print(f"Input to hidden weights shape: {model.gru2.weight_ih_l0.shape}")
print(f"Hidden to hidden weights shape: {model.gru2.weight_hh_l0.shape}")
print(f"Input to hidden bias shape: {model.gru2.bias_ih_l0.shape}")
print(f"Hidden to hidden bias shape: {model.gru2.bias_hh_l0.shape}")# 可视化原始频谱
plt.figure(figsize=(12, 8))
plt.subplot(3, 1, 1)
plt.imshow(spectrogram_db, aspect='auto', origin='lower', cmap='inferno')
plt.title("Original Spectrogram")
plt.xlabel("Time Frames")
plt.ylabel("Frequency Bins")
plt.colorbar(format='%+2.0f dB')# 可视化GRU输出的特征图
# 第一层GRU输出
plt.subplot(3, 1, 2)
plt.imshow(gru_output1[0].detach().numpy().T, aspect='auto', origin='lower', cmap='viridis')
plt.title("GRU Layer 1 Output Feature Map")
plt.xlabel("Time Steps")
plt.ylabel("Hidden State Dimensions")
plt.colorbar(label='Activation Value')# 第二层GRU输出
plt.subplot(3, 1, 3)
plt.imshow(gru_output2[0].detach().numpy().T, aspect='auto', origin='lower', cmap='plasma')
plt.title("GRU Layer 2 Output Feature Map")
plt.xlabel("Time Steps")
plt.ylabel("Hidden State Dimensions")
plt.colorbar(label='Activation Value')plt.tight_layout()
plt.savefig('gru_spectrogram_features.png', dpi=300)
plt.show()

Spectrogram tensor shape: torch.Size([1, 188, 257])
GRU Layer 1 output shape: torch.Size([1, 188, 100])
GRU Layer 2 output shape: torch.Size([1, 188, 64])GRU Layer 1 parameters:
Input to hidden weights shape: torch.Size([300, 257])
Hidden to hidden weights shape: torch.Size([300, 100])
Input to hidden bias shape: torch.Size([300])
Hidden to hidden bias shape: torch.Size([300])GRU Layer 2 parameters:
Input to hidden weights shape: torch.Size([192, 100])
Hidden to hidden weights shape: torch.Size([192, 64])
Input to hidden bias shape: torch.Size([192])
Hidden to hidden bias shape: torch.Size([192])

在这里插入图片描述

PyTorch GRU 参数详解

1. weight_ih_l0

含义：输入到隐藏的权重（Input to Hidden weights）
形状：(3 * hidden_size, input_size)
内容：包含所有三个计算组件的输入部分权重，按顺序堆叠：
- 第0部分：更新门的输入权重 $W_z^{(x)})$
- 第1部分：重置门的输入权重 $W_r^{(x)})$
- 第2部分：候选隐藏状态的输入权重 $W^{(x)}）$
数学关系：
$W_z = [U_z, W_z^{(x)}]$

其中 $W_z$ 是数学表示中的完整更新门权重矩阵。

2. weight_hh_l0

含义：隐藏到隐藏的权重（Hidden to Hidden weights）
形状：(3 * hidden_size, hidden_size)
内容：包含所有三个计算组件的隐藏状态部分权重，按顺序堆叠：
- 第0部分：更新门的隐藏权重 $U_z）$
- 第1部分：重置门的隐藏权重 $U_r）$
- 第2部分：候选隐藏状态的隐藏权重 $（ U ）$
数学关系：
$W_z = [U_z, W_z^{(x)}]$

其中 $W_z$ 是数学表示中的完整更新门权重矩阵。

3. bias_ih_l0

含义：输入到隐藏的偏置（Input to Hidden bias）
形状：(3 * hidden_size)
内容：包含所有三个计算组件的偏置项，按顺序堆叠：
- 第0部分：更新门偏置 $b_z）$
- 第1部分：重置门偏置 $b_r）$
- 第2部分：候选隐藏状态偏置 $（ b ）$

4. bias_hh_l0

含义：隐藏到隐藏的偏置（Hidden to Hidden bias）
形状：(3 * hidden_size)
内容：包含所有三个计算组件的隐藏状态偏置，按顺序堆叠：
- 第0部分：更新门隐藏偏置 $b_z^{(h)}）$
- 第1部分：重置门隐藏偏置 $b_r^{(h)}）$
- 第2部分：候选隐藏状态隐藏偏置 $b^{(h)}）$
实际使用：
- 在计算中，总偏置为 $b = bias\_ih\_l0 + bias\_hh\_l0$
- 即 $b_z = b_z + b_z^{(h)}$ （逐元素相加）

特性	GRU	LSTM
门控/组件数量	3（更新门、重置门、候选状态）	4（输入门、遗忘门、输出门、候选状态）
weight_ih_l0 形状	$(3×hidden_size,input_size)(3 \times hidden\_size, input\_size)$	$(4×hidden_size,input_size)(4 \times hidden\_size, input\_size)$
weight_hh_l0 形状	$(3×hidden_size,hidden_size)(3 \times hidden\_size, hidden\_size)$	$(4×hidden_size,hidden_size)(4 \times hidden\_size, hidden\_size)$
bias_ih_l0 形状	$(3×hidden_size)(3 \times hidden\_size)$	$(4×hidden_size)(4 \times hidden\_size)$
bias_hh_l0 形状	$(3×hidden_size)(3 \times hidden\_size)$	$(4×hidden_size)(4 \times hidden\_size)$
参数总量	约比 LSTM 少 25%	较多