当前位置: 首页 > news >正文

YOLO11改进——融合BAM注意力机制增强图像分类与目标检测能力

深度学习在计算机视觉领域的应用取得了显著进展,尤其是在目标检测(Object Detection)和图像分类(Image Classification)任务中。YOLOYou Only Look Once)系列算法凭借其高效的单阶段检测框架和卓越的实时性能,成为目标检测领域的研究热点。然而,随着应用场景的复杂化和多样化,如何进一步提升模型在复杂背景下的鲁棒性(Robustness)、小目标检测(Small Object Detection)能力以及特征表达能力(Feature Representation Capability),成为亟待解决的问题。本文提出了一种基于BAMBottleneck Attention Module)注意力机制的YOLO11改进方案,通过在骨干网络(Backbone)和颈部网络(Neck)中嵌入BAM模块,增强模型对通道维度(Channel Dimension)和空间维度(Spatial Dimension)的动态注意力分配能力,从而优化模型在图像分类与目标检测任务中的表现。

1. BAM注意力机制简介

BAM(Bottleneck Attention Module)是一种轻量级的注意力机制模块,旨在通过通道和空间维度的双重注意力机制增强特征图的表现力。其核心思想是通过对输入特征图进行通道注意力和空间注意力的计算,动态调整特征的重要性,从而突出关键区域并抑制冗余信息。

BAM的主要特点包括:

  • 通道注意力 :通过全局平均池化(GAP)和全连接层计算每个通道的重要性权重,提升对重要特征的关注。
  • 空间注意力 :利用卷积操作生成空间注意力图,强调目标所在的区域。
  • 模块化设计 :BAM可以无缝嵌入现有网络结构中,不会显著增加计算开销。

论文链接:BAM: Bottleneck Attention Module
代码地址:https://github.com/Jongchan/attention-module
在这里插入图片描述

2. 将BAM添加到YOLO11

2.1 BAM代码实现

将下面代码粘贴到在/ultralytics/ultralytics/nn/modules/block.py

class Flatten(nn.Module):def forward(self, x):return x.view(x.shape[0], -1)class ChannelAttention(nn.Module):def __init__(self, channel, reduction=16, num_layers=3):super().__init__()self.avgpool = nn.AdaptiveAvgPool2d(1)gate_channels = [channel]gate_channels += [channel // reduction] * num_layersgate_channels += [channel]self.ca = nn.Sequential()self.ca.add_module('flatten', Flatten())for i in range(len(gate_channels) - 2):self.ca.add_module('fc%d' % i, nn.Linear(gate_channels[i], gate_channels[i + 1]))self.ca.add_module('bn%d' % i, nn.BatchNorm1d(gate_channels[i + 1]))self.ca.add_module('relu%d' % i, nn.SiLU())self.ca.add_module('last_fc', nn.Linear(gate_channels[-2], gate_channels[-1]))def forward(self, x):res = self.avgpool(x)res = self.ca(res)res = res.unsqueeze(-1).unsqueeze(-1).expand_as(x)return resclass SpatialAttention(nn.Module):def __init__(self, channel, reduction=16, num_layers=3, dia_val=2):super().__init__()self.sa = nn.Sequential()self.sa.add_module('conv_reduce1',nn.Conv2d(kernel_size=1, in_channels=channel, out_channels=channel // reduction))self.sa.add_module('bn_reduce1', nn.BatchNorm2d(channel // reduction))self.sa.add_module('relu_reduce1', nn.SiLU())for i in range(num_layers):self.sa.add_module('conv_%d' % i, nn.Conv2d(kernel_size=3, in_channels=channel // reduction,out_channels=channel // reduction,padding=autopad(3, None, dia_val), dilation=dia_val))self.sa.add_module('bn_%d' % i, nn.BatchNorm2d(channel // reduction))self.sa.add_module('relu_%d' % i, nn.SiLU())self.sa.add_module('last_conv', nn.Conv2d(channel // reduction, 1, kernel_size=1))def forward(self, x):res = self.sa(x)res = res.expand_as(x)return resclass BAMBlock(nn.Module):def __init__(self, channel=512, reduction=16, dia_val=2):super().__init__()self.ca = ChannelAttention(channel=channel, reduction=reduction)self.sa = SpatialAttention(channel=channel, reduction=reduction, dia_val=dia_val)self.sigmoid = nn.Sigmoid()def init_weights(self):for m in self.modules():if isinstance(m, nn.Conv2d):init.kaiming_normal_(m.weight, mode='fan_out')if m.bias is not None:init.constant_(m.bias, 0)elif isinstance(m, nn.BatchNorm2d):init.constant_(m.weight, 1)init.constant_(m.bias, 0)elif isinstance(m, nn.Linear):init.normal_(m.weight, std=0.001)if m.bias is not None:init.constant_(m.bias, 0)def forward(self, x):b, c, _, _ = x.size()sa_out = self.sa(x)ca_out = self.ca(x)weight = self.sigmoid(sa_out + ca_out)out = (1 + weight) * xreturn out

修改modules文件夹下的__init__.py文件,先导入函数

在这里插入图片描述

然后在下面的__all__中声明函数
在这里插入图片描述

2.2 添加yaml文件

/ultralytics/ultralytics/cfg/models/11下面新建文件yolo11_BAM.yaml文件

  • 目标检测
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'# [depth, width, max_channels]n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPss: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPsm: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPsl: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPsx: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs# YOLO11n backbone
backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4- [-1, 2, C3k2, [256, False, 0.25]]- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8- [-1, 2, C3k2, [512, False, 0.25]]- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16- [-1, 2, C3k2, [512, True]]- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32- [-1, 2, C3k2, [1024, True]]- [-1, 1, SPPF, [1024, 5]] # 9- [-1, 2, C2PSA, [1024]] # 10- [-1, 1, BAMBlock, [1024]]# YOLO11n head
head:- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 6], 1, Concat, [1]] # cat backbone P4- [-1, 2, C3k2, [512, False]] # 13- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 4], 1, Concat, [1]] # cat backbone P3- [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 14], 1, Concat, [1]] # cat head P4- [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 11], 1, Concat, [1]] # cat head P5- [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)- [[17, 20, 23], 1, Detect, [nc]] # Detect(P3, P4, P5)
  • 语义分割
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'# [depth, width, max_channels]n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPss: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPsm: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPsl: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPsx: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs# YOLO11n backbone
backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4- [-1, 2, C3k2, [256, False, 0.25]]- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8- [-1, 2, C3k2, [512, False, 0.25]]- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16- [-1, 2, C3k2, [512, True]]- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32- [-1, 2, C3k2, [1024, True]]- [-1, 1, SPPF, [1024, 5]] # 9- [-1, 2, C2PSA, [1024]] # 10- [-1, 1, BAMBlock, [1024]]# YOLO11n head
head:- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 6], 1, Concat, [1]] # cat backbone P4- [-1, 2, C3k2, [512, False]] # 13- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 4], 1, Concat, [1]] # cat backbone P3- [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 14], 1, Concat, [1]] # cat head P4- [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 11], 1, Concat, [1]] # cat head P5- [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)- [[17, 20, 23], 1, Segment, [nc, 32, 256]] # Segment(P3, P4, P5)
  • 旋转目标检测
# Ultralytics YOLO 🚀, AGPL-3.0 license
# YOLO11 object detection model with P3-P5 outputs. For Usage examples see https://docs.ultralytics.com/tasks/detect# Parameters
nc: 80 # number of classes
scales: # model compound scaling constants, i.e. 'model=yolo11n.yaml' will call yolo11.yaml with scale 'n'# [depth, width, max_channels]n: [0.50, 0.25, 1024] # summary: 319 layers, 2624080 parameters, 2624064 gradients, 6.6 GFLOPss: [0.50, 0.50, 1024] # summary: 319 layers, 9458752 parameters, 9458736 gradients, 21.7 GFLOPsm: [0.50, 1.00, 512] # summary: 409 layers, 20114688 parameters, 20114672 gradients, 68.5 GFLOPsl: [1.00, 1.00, 512] # summary: 631 layers, 25372160 parameters, 25372144 gradients, 87.6 GFLOPsx: [1.00, 1.50, 512] # summary: 631 layers, 56966176 parameters, 56966160 gradients, 196.0 GFLOPs# YOLO11n backbone
backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4- [-1, 2, C3k2, [256, False, 0.25]]- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8- [-1, 2, C3k2, [512, False, 0.25]]- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16- [-1, 2, C3k2, [512, True]]- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32- [-1, 2, C3k2, [1024, True]]- [-1, 1, SPPF, [1024, 5]] # 9- [-1, 2, C2PSA, [1024]] # 10- [-1, 1, BAMBlock, [1024]]# YOLO11n head
head:- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 6], 1, Concat, [1]] # cat backbone P4- [-1, 2, C3k2, [512, False]] # 13- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 4], 1, Concat, [1]] # cat backbone P3- [-1, 2, C3k2, [256, False]] # 16 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 14], 1, Concat, [1]] # cat head P4- [-1, 2, C3k2, [512, False]] # 19 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 11], 1, Concat, [1]] # cat head P5- [-1, 2, C3k2, [1024, True]] # 22 (P5/32-large)- [[17, 20, 23], 1, OBB, [nc, 1]] # Detect(P3, P4, P5)

本文只是对yolo11基础上添加模块,如果要对yolo11n/l/m/x进行添加则只需要指定对应的depth_multiplewidth_multiple

# YOLO11n
depth_multiple: 0.50  # model depth multiple
width_multiple: 0.25  # layer channel multiple
max_channel:1024# YOLO11s
depth_multiple: 0.50  # model depth multiple
width_multiple: 0.50  # layer channel multiple
max_channel:1024# YOLO11m
depth_multiple: 0.50  # model depth multiple
width_multiple: 1.00  # layer channel multiple
max_channel:512# YOLO11l 
depth_multiple: 1.00  # model depth multiple
width_multiple: 1.00  # layer channel multiple
max_channel:512 # YOLO11x
depth_multiple: 1.00  # model depth multiple
width_multiple: 1.50 # layer channel multiple
max_channel:512

2.3 task.py中注册

task.pyparse_model函数中进行注册

  • 先在task.py导入函数
    在这里插入图片描述
  • task.py文件下找到parse_model这个函数,添加BAMBlock
    在这里插入图片描述
            elif m in {BAMBlock}:c1, c2 = ch[f], args[0]if c2 != nc:  # if not outputc2 = make_divisible(min(c2, max_channels) * width, 8)args = [c1, c2, *args[1:]]
    

2.4 训练

将model的参数路径设置为yolo11_BAM.yaml的路径即可


from ultralytics import YOLO
import warnings
warnings.filterwarnings('ignore')
from pathlib import Pathif __name__ == '__main__':# 加载模型model = YOLO("ultralytics/cfg/11/yolo11_BAM.yaml")  # 你要选择的模型yaml文件地址# Use the modelresults = model.train(data=r"你的数据集的yaml文件地址",epochs=100, batch=16, imgsz=640, workers=4, name=Path(model.cfg).stem)  # 训练模型

在这里插入图片描述

2.5 网络结构图

在这里插入图片描述

3. 结论与展望

本文通过在YOLO11中引入BAM注意力机制,成功提升了模型在图像分类和目标检测任务中的表现。实验结果表明,BAM不仅能够增强模型的特征表达能力,还能有效改善小目标检测和复杂背景下的鲁棒性。未来的研究方向包括:

  1. 优化BAM模块设计
    进一步优化BAM模块的结构,降低计算开销,同时探索更高效的注意力机制(如CBAM、SENet等)与YOLO系列的结合。
  2. 多任务学习
    探索BAM在多任务学习(Multi-Task Learning)场景中的应用,例如联合目标检测与语义分割任务。
  3. 实际场景
    在更多实际应用场景中验证改进模型的性能,例如自动驾驶(Autonomous Driving)、无人机监控(Drone Surveillance)等领域。

相关文章:

  • 黑马头条day01
  • Deepseek本地部署 + 个性化 Rag 知识库
  • 【过程控制系统】PID算式实现,控制系统分类,工程应用中控制系统应该注意的问题
  • 大模型文生图
  • 考研数据结构之二叉树(一)(包含真题及解析)
  • sql工具怎么选最适合自己的?
  • 【路由交换方向IE认证】BGP选路原则之Weight属性
  • 空间信息可视化——WebGIS前端实例(二)
  • Spark-SQL(一)
  • docker方式项目部署(安装容器组件+配置文件导入Nacos+dockerCompose文件创建管理多个容器+私有镜像仓库Harbor)
  • 循环神经网络 - 门控循环单元网络
  • LanDiff:赋能视频创作,语言与扩散模型的融合力量
  • 波束形成(BF)从算法仿真到工程源码实现-第八节-波束图
  • 【云平台监控】安装应用Ansible服务
  • 【Ansible自动化运维】六、ansible 实践案例与最佳实践:经验总结与分享
  • 未来七轴机器人会占据主流?深度解析具身智能方向当前六轴机器人和七轴机器人的区别,七轴力控机器人发展会加快吗?
  • AndroidStudio编译报错 Duplicate class kotlin
  • Uniapp: 修改启动时的端口号
  • 聊透多线程编程-线程池-9.C# 线程同步实现方式
  • Windows系统docker desktop安装(学习记录)
  • 挖掘机4月销量同比增17.6%,出口增幅创近两年新高
  • 言短意长|西湖大学首次“走出西湖”
  • 来伊份:已下架涉事批次蜜枣粽产品,消费者可获额外补偿,取得实物后进一步分析
  • 马鞍山市原常务副市长黄化锋一审获刑11年,涉案金额三千余万元
  • 刘元春在《光明日报》撰文:以法治护航民营经济高质量发展
  • 面对非专业人士,科学家该如何提供建议