EECS 498 Deep Learning for Computer Vision Winter 2022 A2

dW[:, j] += X[i]dW[:, y[i]] -= X[i]#注意这里,求的是平均梯度。因为之前求过平均损失。dW /= num_traindW += 2 * reg * W
num_train = X.shape[0]scores = X.mm(W)score_y = scores[torch.arange(num_train), y].view(-1, 1)margins = torch.relu(scores - score_y + 1)margins[torch.arange(num_train), y] = 0loss = torch.sum(margins) / num_train

上面的蓝色对应实际有问题


下面的代码之所以要转置,是为了把方便把每条数据编做一个列向量

binary = marginsbinary[margins > 0] = 1row_sum = torch.sum(binary, dim=1)binary[torch.arange(num_train), y] = -row_sumdW = X.t().mm(binary) / num_traindW += 2 * reg * W

这个就很简单啦,不要被迷惑,抓住上面的式子,不断分组分析,复合求导即可
