pytorch 中是如何实现embeding 的
嵌入层(Embedding)说明
参数(Args)
属性(Attributes)
形状(Shape)
注意事项(.. note::)
n, d, m = 3, 5, 7
embedding = nn.Embedding(n, d, max_norm=True)
W = torch.randn((m, d), requires_grad=True)
idx = torch.tensor([1, 2])
a = embedding.weight.clone() @ W.t() # 需克隆weight,否则该操作无法计算梯度
b = embedding(idx) @ W.t() # 原地修改weight
out = (a.unsqueeze(0) + b.unsqueeze(1))
loss = out.sigmoid().prod()
loss.backward()
示例(Examples::)
>>> # 一个包含10个3维嵌入向量的Embedding模块
>>> embedding = nn.Embedding(10, 3)
>>> # 一个批量(batch):2个样本,每个样本包含4个索引
>>> input = torch.LongTensor([[1, 2, 4, 5], [4, 3, 2, 9]])
>>> # xdoctest: +IGNORE_WANT("non-deterministic")(注:用于忽略非确定性输出的测试检查)
>>> embedding(input)
tensor([[[-0.0251, -1.6902, 0.7172],[-0.6431, 0.0748, 0.6969],[ 1.4970, 1.3448, -0.9685],[-0.3677, -2.7265, -0.1685]],[[ 1.4970, 1.3448, -0.9685],[ 0.4362, -0.4004, 0.9400],[-0.6431, 0.0748, 0.6969],[ 0.9124, -2.3616, 1.1151]]])>>> # 带有padding_idx的示例
>>> embedding = nn.Embedding(10, 3, padding_idx=0)
>>> input = torch.LongTensor([[0, 2, 0, 5]])
>>> embedding(input)
tensor([[[ 0.0000, 0.0000, 0.0000], # padding_idx=0对应的向量全为0[ 0.1535, -2.0309, 0.9315],[ 0.0000, 0.0000, 0.0000], # 再次出现padding_idx=0,向量仍为0[-0.1655, 0.9897, 0.0635]]])>>> # 修改“填充向量(pad vector)”的示例
>>> padding_idx = 0
>>> embedding = nn.Embedding(3, 3, padding_idx=padding_idx)
>>> embedding.weight # 初始时,padding_idx=0对应的向量全为0
Parameter containing:
tensor([[ 0.0000, 0.0000, 0.0000],[-0.7895, -0.7089, -0.0364],[ 0.6778, 0.5803, 0.2678]], requires_grad=True)
>>> with torch.no_grad(): # 禁用梯度计算,避免修改操作影响训练
... embedding.weight[padding_idx] = torch.ones(3) # 将填充向量改为全1
>>> embedding.weight # 修改后,padding_idx=0对应的向量变为全1
Parameter containing:
tensor([[ 1.0000, 1.0000, 1.0000],[-0.7895, -0.7089, -0.0364],[ 0.6778, 0.5803, 0.2678]], requires_grad=True)
总之 某个词的 embeding 向量 ,就是 embeding 矩阵的某一层,矩阵随机生成的,并且是可学习的