当前位置：首页 > news >正文

注意力得分矩阵求解例子

news 2025/7/2 8:46:10

在这里插入图片描述

这个公式是注意力机制（Attention Mechanism）中的核心公式，用于计算注意力得分（Attention Score），其中：

$Q$ 是查询矩阵（Query Matrix），包含多个查询向量。
$K$ 是键矩阵（Key Matrix），包含多个键向量。
$V$ 是值矩阵（Value Matrix），包含多个值向量。
$d_k$ 是键向量的维度。
$A (Q, K, V)$ 是输出的注意力得分矩阵。

公式可以分解为以下步骤：

计算查询向量和键向量的点积： $QK^T$ 。
对点积结果进行缩放： $\frac{QK^T}{\sqrt{d_k}}$ 。
应用softmax函数进行归一化： $\text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right)$ 。
将归一化后的结果与值矩阵 $V$ 相乘，得到最终的注意力得分矩阵。

下面通过一个具体的例子来验证这个公式。

例子设定

假设：

计算步骤

计算 $ QK^T $：
$QK^T = \begin{bmatrix} 1 \cdot 5 + 2 \cdot 7 & 1 \cdot 6 + 2 \cdot 8 \\ 3 \cdot 5 + 4 \cdot 7 & 3 \cdot 6 + 4 \cdot 8 \end{bmatrix} = \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix}$
对点积结果进行缩放：
$\frac{QK^T}{\sqrt{d_k}} = \frac{1}{\sqrt{2}} \begin{bmatrix} 19 & 22 \\ 43 & 50 \end{bmatrix} \approx \begin{bmatrix} 13.435 & 15.556 \\ 30.406 & 35.355 \end{bmatrix}$
应用softmax函数进行归一化：
对每个列向量应用softmax函数。
对于第一列：
$\text{softmax}\left(\begin{bmatrix} 13.435 \\ 30.406 \end{bmatrix}\right) = \begin{bmatrix} \frac{e^{13.435}}{e^{13.435} + e^{30.406}} \\ \frac{e^{30.406}}{e^{13.435} + e^{30.406}} \end{bmatrix} \approx \begin{bmatrix} 0.0001 \\ 0.9999 \end{bmatrix}$
对于第二列：
$\text{softmax}\left(\begin{bmatrix} 15.556 \\ 35.355 \end{bmatrix}\right) = \begin{bmatrix} \frac{e^{15.556}}{e^{15.556} + e^{35.355}} \\ \frac{e^{35.355}}{e^{15.556} + e^{35.355}} \end{bmatrix} \approx \begin{bmatrix} 0.0001 \\ 0.9999 \end{bmatrix}$
因此，归一化后的矩阵为：
$\text{softmax}\left(\frac{QK^T}{\sqrt{d_k}}\right) \approx \begin{bmatrix} 0.0001 & 0.0001 \\ 0.9999 & 0.9999 \end{bmatrix}$
将归一化后的结果与值矩阵 $ V $ 相乘：
$\approx \begin{bmatrix} 0.0001 & 0.0001 \\ 0.9999 & 0.9999 \end{bmatrix} \begin{bmatrix} 9 & 10 \\ 11 & 12 \end{bmatrix} \approx \begin{bmatrix} 0.002 & 0.0022 \\ 20 & 22 \end{bmatrix}$

验证

通过上述计算，我们得到了注意力得分矩阵 $A (Q, K, V)$ 。
这个结果符合注意力机制的基本原理：查询向量与键向量的相似度越高，对应的值向量在最终结果中的贡献越大。
在这个例子中，由于查询向量和键向量的点积结果较大，softmax函数将其归一化为接近1的值，因此值矩阵 $V$ 中的对应行在最终结果中占据了主导地位。
通过这个例子，验证了注意力机制公式的正确性。