当前位置：首页 > news >正文

卷积神经网络CNN-part4-VGG

news 2025/9/8 12:57:12

卷积神经网络CNN-part3-AlexNet-CSDN博客

摘要：本文介绍了VGG卷积神经网络的结构与实现。VGG由牛津大学提出，采用模块化设计，包含带填充的3×3卷积层、ReLU激活函数和最大池化层。文章详细描述了VGG11的网络架构（5个卷积块+3个全连接层），并提供了PyTorch实现代码，包括自定义VGG块和网络结构。为降低计算量，作者构建了通道数缩减4倍的简化版网络在Fashion-MNIST上进行训练。最后对比了VGG16的结构特点，其包含6个卷积块（共16层），通过堆叠小卷积核实现深层网络。VGG虽参数量大，但在迁移学习中表现优异。

0. 前言

VGG是牛津大学的视觉几何组（visual geometry group，VGG）提出的深度神经网络。VGG将卷积层的部分组成模块再来构建卷积网络。

VGG模型是2014年ILSVRC竞赛的第二名，第一名是GoogLeNet。但是VGG模型在多个迁移学习任务中的表现要优于googLeNet。而且，从图像中提取CNN特征，VGG模型是首选算法。它的缺点在于，参数量有140M之多，需要更大的存储空间。但是这个模型很有研究价值。VGG核AlexNet的结构比较。VGG以块的形式构建网络。

1. VGG

1.1VGG块

VGG中单个模块主要包含三个部分：

👉带填充以保持分辨率的卷积层；

👉非线性激活函数，ruReLU；

👉汇聚层，如最大汇聚层。

这里我们用一个带有3x3卷积核，填充为1（保持高度和宽度）的卷积层，以及带有2x2汇聚窗口、步幅为2（每个块后的分辨率减半）的最大汇聚层。我们已vgg_block函数来实现一个VGG块。

import torch
from torch import nn
from d2l import torch as d2l#num_convs    卷积层数量
#in_channels    输入通道
#out_channels    输出通道
def vgg_block(num_convs,in_channels,out_channels):layers = []for _ in range(num_convs):#卷积层，3x3核，填充1layers.append(nn.Conv2d(in_channels,out_channels,kernel_size=3,padding=1))layers.append(nn.ReLU())   #激活函数ReLUin_channels = out_channelslayers.append(nn.MaxPool2d(kernel_size=2, stride=2))   #最大聚集层return nn.Sequential(*layers)

1.2VGG网络

我们这里以上面的VGG块构建VGG11，一个包含5个卷积块，前2个块包含一个卷积层，后3个块包含两个卷积层，3个全连接层。11=（2x1+3x2=）8个卷积层+3个全连接层。

通道数从64->128->256->512->512

以下代码实现VGG11网络：

#通道数指定
conv_arch=((1,64),(1,128),(2,256),(2,512),(2,512))
def vgg(conv_arch):conv_blks=[]in_channels=1#卷积层部分for (num_convs,out_channels) in conv_arch:conv_blks.append(vgg_block(num_convs,in_channels,out_channels))in_channels=out_channelsreturn nn.Sequential(*conv_blks,nn.Flatten(),#全连接层nn.Linear(out_channels*7*7,4096),nn.ReLU(),nn.Dropout(0.5),nn.Linear(4096,4096),nn.ReLU(),nn.Dropout(0.5),nn.Linear(4096,10))
#实例化
net=vgg(conv_arch)

以下代码用来查看网络的数据：

X=torch.randn(size=(1,1,224,224))
for blk in net:X=blk(X)print(blk.__class__.__name__,'output shape:\t',X.shape)

结果：

        Sequential output shape:   torch.Size([1, 64, 112, 112])
        Sequential output shape:   torch.Size([1, 128, 56, 56])
        Sequential output shape:   torch.Size([1, 256, 28, 28])
        Sequential output shape:   torch.Size([1, 512, 14, 14])
        Sequential output shape:   torch.Size([1, 512, 7, 7])
        Flatten output shape:           torch.Size([1, 25088])
        Linear output shape:            torch.Size([1, 4096])
        ReLU output shape:             torch.Size([1, 4096])
        Dropout output shape:        torch.Size([1, 4096])
        Linear output shape:          torch.Size([1, 4096])
        ReLU output shape:             torch.Size([1, 4096])
        Dropout output shape:         torch.Size([1, 4096])
        Linear output shape:            torch.Size([1, 10])

2.训练

由于VGG11计算量较大，我们构建一个通道数较少的网络，足够用Fashion-MNIST数据集。

ratio=4
small_conv_arch=[(pair[0],pair[1]//ratio) for pair in conv_arch]
net=vgg(small_conv_arch)

加载数据集核训练：

lr,num_epochs,batch_size=0.05,10,128
train_iter,test_iter=d2l.load_data_fashion_mnist(batch_size, resize=224)
d2l.train_ch6(net,train_iter,test_iter,num_epochs,lr,d2l.try_gpu())