当前位置：首页 > news >正文

利用矩阵相乘手动实现卷积操作

news 2025/9/24 1:31:49

卷积（Convolution） 是信号处理和图像处理中的一种重要操作，广泛应用于深度学习（尤其是卷积神经网络，CNN）中。它的核心思想是通过一个卷积核（Kernel） 或 滤波器（Filter） 对输入信号或图像进行扫描，提取局部特征。在信号处理领域，卷积可以看作是两个函数或信号在某种程度上的“重叠”运算。在图像处理中，卷积是图像滤波的核心操作。图像滤波器，例如边缘检测、模糊和锐化都是通过卷积来实现的。

1. 卷积的数学定义

一维离散卷积

给定两个离散信号 f 和 g，它们的卷积 (f∗g) 定义为：

$(f * g)[n] = \sum_{m=-\infty}^{\infty} f[m] \cdot g[n - m]$

二维离散卷积

对于二维信号（如图像），卷积的定义为：

$(f * g)[m, n] = \sum_{k_1=-\infty}^{\infty} \sum_{k_2=-\infty}^{\infty} f[k_1, k_2] \cdot g[m - k_1, n - k_2]$

2. 卷积的直观理解

卷积操作可以理解为：

滑动窗口：卷积核在输入信号或图像上滑动。
点积操作：在每个位置，卷积核与输入信号的局部区域进行点积。
特征提取：通过卷积核提取输入信号的局部特征。

3. 卷积的参数

在深度学习中，卷积操作通常包含以下参数：

输入（Input）：输入信号或图像，形状为 (batch_size, channels, height, width)。
卷积核（Kernel）：滤波器，形状为 (out_channels, in_channels, kernel_height, kernel_width)。
步长（Stride）：卷积核滑动的步长，控制输出的大小。
填充（Padding）：在输入信号或图像的边缘填充值（如 0），控制输出的大小。
输出（Output）：卷积操作的结果，形状为 (batch_size, out_channels, output_height, output_width)。

4. 卷积的输出大小

卷积操作的输出大小可以通过以下公式计算：

$\text{output\_height} = \left\lfloor \frac{\text{input\_height} - \text{kernel\_height}+2*\text{padding}}{\text{stride}} \right\rfloor + 1$

其中：

input_size：输入信号或图像的大小。
kernel_size：卷积核的大小。
padding：填充大小。
stride：步长。

5.卷积的计算

1.单输入通道，单个卷积核

输入图片的像素值如下：

$\begin{bmatrix} 1 & 1 & 1 & 0 & 0 \\ 0 & 1 & 1 & 1 & 0 \\ 0 & 0 & 1 & 1 & 1 \\ 0 & 0 & 1 & 1 & 0 \\ 0 & 1 & 1 & 0 & 0 \end{bmatrix}$

卷积核为：

$\begin{bmatrix} 1 & 0 &1 \\ 0 & 1 & 0 \\ 1 & 0 & 1 \\ \end{bmatrix}$

计算第一个子区域和卷积核的对应元素乘积之和，如下图所示：

Cov_feature[0,0]=1x1+1x0+1x1+0x0+1x1+1x0+0x1+0x0+1x1 =4

接着计算第二个子区域和卷积核的对应元素乘积之和，如下图所示：

Cov_feature[0,1] =1x1+1x0+0x1+1x0+1x1+1x0+0x1+1x0+1x1=3

……

2.多输入通道，单个卷积核

若输入含有多个通道，则对于某个卷积核，分别对每个通道求feature map后将对应位置相加得到最终的feature map，如下图所示：

3.多个卷积核

6. 卷积的代码实现

1.简单卷积的实现（不包含batch_size,channels）：

import  torch


def  matrix_muti_for_cov(x,kernel,stride=1):
    # kernel.shape ->(h,w)
    
    output_h= int((x.shape[0]-kernel.shape[0])/stride) +1   # 计算输入的高
    output_w= int((x.shape[1]-kernel.shape[1])/stride) +1   # 计算输入的宽
    output =torch.zeros(output_h,output_w) #  初始化为（output_h,output_w）的矩阵
    
    for i in range (0,x.shape[0]-kernel.shape[0]+1,stride): # 遍历高的维度
        
        for j in range (0,x.shape[1]-kernel.shape[1]+1,stride): # 遍历宽的维度
            
            area = x[i:i+kernel.shape[0],j:j+kernel.shape[1]] # 获取卷积核滑过区域
            output[i,j] =torch.sum(area*kernel)  实现卷积操作
    return  output

调用函数，求卷积结果


input =torch.randn(5,5)
kernel =torch.randn(3,3)  
output =matrix_muti_for_cov(input,kernel)
print(output)

输出为

tensor([[-2.0837, -1.1043, 3.2571],
[-1.1638, 0.7576, 3.2776],
[ 0.3669, 0.4015, 0.9808]])

使用torch.nn.functional.conv2d(input,jernel) 来测试：

在conv2d函数中，要求

input.shape(batch_size,in_channels,hight,weight）

kernel.shape(out_channels,in_channels,kernel_hight,kernel_weight）

input =input.reshape((1,1,input.shape[0],input.shape[1]))
kernel =kernel.reshape((1,1,kernel.shape[0],kernel.shape[1]))
cov_out =F.conv2d(input,kernel)
print(cov_out.squeeze(0).squeeze(0))

输出为

tensor([[-2.0837, -1.1043, 3.2571],
[-1.1638, 0.7576, 3.2776],
[ 0.3669, 0.4015, 0.9808]])

cov_out.squeeze(0).squeeze(0)是为了将batch_size维度和channels维度的数据剔出，和上面的output的数据维度相对应。

对上述代码进行简单的升级操作

def  matrix_muti_for_cov(x,kernel,stride=1,padding=0):
    # kernel.shape ->(h,w)
    output_h= int((x.shape[0]-kernel.shape[0])/stride) +1
    output_w= int((x.shape[1]-kernel.shape[1])/stride) +1
    output =torch.zeros(output_h,output_w)
    area_matrix = torch.zeros(output.numel(),kernel.numel())
    kernel_matrix =kernel.reshape(kernel.numel(),-1)
    for i in range (0,x.shape[0]-kernel.shape[0]+1,stride):
        for j in range (0,x.shape[1]-kernel.shape[1]+1,stride):
            
            area = x[i:i+kernel.shape[0],j:j+kernel.shape[1]]
            area_matrix[i+j] = torch.flatten(area)
    output_matrix =area_matrix@ kernel_matrix
    output = output_matrix.reshape(output_h, output_w)
    return  output

2.简易完整卷积的实现（包含batch_size,channels，stride，padding）：

def  matrix_muti_for_cov2(input,kernel,stride=1,padding=1):
    
    # input.size ---> [batch_size,channels,hight,weight]
    batch,channel,x_h,x_w =input.shape

    # input.size ---> [out_channels,in_channels,kernel_hight,kernel_weight]
    channel_out,channels_in,kernel_h,kernel_w =kernel.shape

    # math.floor() 函数的作用是向下取整，也称为取底。 它返回小于或等于给定数值的最大整数
    output_h= (math.floor((x_h+2*padding-kernel_h)/stride) +1)
    output_w= (math.floor((x_w+2*padding-kernel_w)/stride) +1)

    output =torch.zeros(batch,channel_out,output_h,output_w)  # 初始化矩阵
    input_padded = torch.zeros(batch, channel, x_h+2*padding, x_w+2*padding) #  实现padding操作
    input_padded[:,:,padding:x_h+padding,padding:x_w+padding] =input  # 将input的值赋值给input_padded对应的区域
    for  b in range(batch):   # 遍历batch维度
        
        for c_out  in  range(channel_out):  # 遍历out_channel维度
            
            for i in range (0,output_h,stride): # 遍历hight维度
                
                for j in range (0,output_w,stride):  # 遍历 weight维度

     
                    area = input_padded[b,:,i:i+kernel_h,j:j+kernel_w]
                    output[b,c_out,i,j] =torch.sum(area*kernel[c_out])                   
                        
    return output

调用函数，测试结果

cov_out =matrix_muti_for_cov2(input,kernel)
# print(cov_out)
cov_out2 =F.conv2d(input,kernel,padding=1)
# print(cov_out2)
if torch.allclose(cov_out, cov_out2, rtol=1e-05, atol=1e-08):
    print("两个卷积结果近似相等。") 
else:
    print("两个卷积结果不相等。")  
    print("最大绝对误差:", torch.max(torch.abs(cov_out - cov_out2)))

输出为“ 两个卷积结果近似相等。”

查看全文

http://www.dtcms.com/a/50172.html