当前位置：首页 > news >正文

有效感受野(ERF)可视化工具

news 2025/7/15 19:39:34

文章目录

- 顶刊上的例子
- 感受野（RF）与有效感受野（ERF）
- - - 1. 感受野（Receptive Field）
    - 2. 有效感受野（Effective Receptive Field, ERF）
    - 3. 为什么ERF比感受野更重要？
- ERF可视化工具计算过程解析
- - - 符号定义
    - 步骤1：计算像素级贡献分数（ $P$ ）
    - 步骤2：聚合贡献分数（ $A$ ）
    - 步骤3：归一化（跨模型对比）
    - 最终效果
- get_model_erf.py代码
- - 使用说明：
  - 输出参数说明
  - 示例：yolo11l.py预训练权重各层

顶刊上的例子

再一些CV顶刊上见到REF可视化的这些图：

CVPR2022 RepLKNet
ECCV2024 MambaIR
CVPR2025 MobileMamba
ECCV2024 Wavelet Convolutions for Large Receptive Fields

在这里插入图片描述

感受野（RF）与有效感受野（ERF）

1. 感受野（Receptive Field）

定义：
在卷积神经网络中，某一层特征图上的一个神经元对应输入图像中的区域，该区域即为该神经元的感受野。简单说，感受野就是输入图像中对该神经元有直接影响的区域大小。

数学计算：
感受野的大小由网络结构决定，可通过递归公式计算：
$RFl=RFl−1+(kl−1)×∏i=1l−1siRF_l = RF_{l-1} + (k_l - 1) \times \prod_{i=1}^{l-1} s_i$
其中：

$ RF_l $ 是第 $ l $ 层的感受野大小；
$ k_l $ 是第 $ l $ 层的卷积核大小；
$ s_i $ 是第 $ i $ 层的步长（stride）。

特点：

理论性：感受野是一个固定的几何区域，由网络结构（卷积核大小、步长、层数）完全确定。
全局性：传统感受野假设该区域内所有像素对输出的贡献均等，但实际并非如此。

2. 有效感受野（Effective Receptive Field, ERF）

定义：
有效感受野是指输入图像中对模型输出特征有显著影响的区域，是输入图像中真正对输出有显著贡献的区域。
与理论感受野不同，ERF考虑了各像素的贡献权重（通过梯度或激活值衡量），通常只是理论感受野的一部分。是指输入图像中对模型输出特征有显著影响的区域。

核心差异：

理论感受野：假设区域内所有像素贡献相同，是一个正方形或矩形。
有效感受野：实际贡献呈高斯分布（中心强、边缘弱），通常集中在理论感受野的中心区域，且形状不规则。

计算方法：

梯度法：通过计算输入对输出的梯度（如Saliency Map），梯度越大表示贡献越大。
方差法：分析输入噪声对输出的影响，方差大的区域贡献高。

应用价值：

模型解释：理解网络各层关注的区域，解释模型行为（如目标检测中的锚框设计）。
网络设计：优化感受野大小，使其与目标尺度匹配（如多尺度特征融合）。
数据增强：根据ERF调整裁剪或缩放策略，确保关键区域被保留。

3. 为什么ERF比感受野更重要？

稀疏性：实验表明，深层网络的ERF通常只占理论感受野的20%-50%，说明大部分边缘区域对输出贡献极小。
非均匀性：ERF的贡献分布不均匀，中心区域贡献远高于边缘，这与理论感受野的均匀假设矛盾。
实际指导意义：在设计模型时，了解ERF可以更精准地选择卷积核大小、步长等参数，避免冗余计算。

ERF可视化工具计算过程解析

见CVPR2022 RepLKNet论文的Appendix B: Visualizing the ERF部分。

用于可视化输入图像各像素对输出特征图中心点的贡献程度，以此直观展示模型的信息聚合范围和关注区域。

步骤概况：通过“导数量化影响→聚合与对数压缩→归一化”的流程，展示了模型关注的输入区域。

符号定义

$I (n \times 3 \times h \times w)$ ：输入图像
- $n$ ：批量大小（样本数），
- $3$ ：输入通道数（如RGB）， $h$ 、 $w$ ：图像的高和宽。
$M (n \times c \times h^{'} \times w^{'})$ ：模型最终输出的特征图
- $c$ ：特征图通道数，
- $h^{'}$ 、 $w^{'}$ ：特征图的高和宽。
关注对象：输出特征图 $M$ 中每个通道的中心点—— $M_{:,:, h'/2, w'/2}$ （即所有样本、所有通道在特征图中心位置的像素）。

步骤1：计算像素级贡献分数（ $P$ ）

目标是衡量输入图像 $I$ 中每个像素对输出特征图中心点的影响，用导数量化这种影响： $\max \left( \frac{\partial \left( \sum_{i}^{n} \sum_{j}^{c} M_{i,j,h'/2,w'/2} \right)}{\partial I}, 0 \right)$

拆解公式：

内层求和 $∑in∑jcMi,j,h′/2,w′/2\sum_{i}^{n} \sum_{j}^{c} M_{i,j,h'/2,w'/2}$ ：将所有样本（ $i$ ）、所有通道（ $j$ ）的特征图中心点数值相加，得到一个总数值（代表“输出中心点的整体响应”）。
导数 $∂(⋅)∂I\frac{\partial (\cdot)}{\partial I}$ ：计算上述总数值对输入图像 $I$ 中每个像素的偏导数。导数越大，说明该输入像素对输出中心点的影响越强。
$max⁡(⋅,0)\max(\cdot, 0)$ ：只保留正贡献（负贡献可能是噪声或抑制作用，此处忽略），得到像素级贡献分数矩阵 $P$ 。

步骤2：聚合贡献分数（ $A$ ）

为了可视化，需要将多个样本、多个通道的贡献分数整合为一个统一的矩阵： $\log_{10} \left( \sum_{i}^{n} \sum_{j}^{3} P_{i,j,:,:} + 1 \right)$

拆解公式：

求和 $∑in∑j3Pi,j,:,:\sum_{i}^{n} \sum_{j}^{3} P_{i,j,:,:}$ ：将所有样本（ $i$ ）、输入图像的3个通道（ $j$ ）的贡献分数 $P$ 相加，得到一个综合的贡献矩阵（尺寸为 $h \times w$ ，与输入图像尺寸一致）。
$+ 1$ ：避免数值为0（防止对数无意义）。
$log⁡10(⋅)\log_{10}(\cdot)$ ：对结果取对数，压缩数值范围（因为贡献分数可能差异极大），使可视化更清晰（例如，将1000和10的差异转化为3和1的差异）。

步骤3：归一化（跨模型对比）

为了让不同模型的ERF可视化结果可直接比较，需将矩阵 $A$ 归一化到[0, 1]范围：

操作：将每个模型的 $A$ 除以自身的最大值（即 $Anorm=A/max⁡(A)A_{\text{norm}} = A / \max(A)$ ）。
目的：消除不同模型输出尺度的差异，突出“相对贡献”的分布模式。

最终效果

通过上述步骤，得到的矩阵 $A$ （归一化后）可被可视化为热力图：

颜色越亮的区域，代表输入图像中该位置对模型输出中心点的贡献越大（即属于模型的有效感受野）。
对比不同模型的热力图，可直观看出：大核CNN（如RepLKNet）的ERF通常更大，而小核CNN或Transformer的ERF分布可能更集中或有不同模式。

get_model_erf.py代码

get_model_erf.py绘制模型的有效感受野

import warnings
import torch
import cv2
import os
import numpy as np
import matplotlib.pyplot as plt
from tqdm import tqdm
from ultralytics.nn.tasks import attempt_load_weights
from timm.utils import AverageMeter
import seaborn as sns# 忽略警告并设置字体
warnings.filterwarnings('ignore')
warnings.simplefilter('ignore')
np.random.seed(0)
plt.rcParams["font.family"] = "Times New Roman"
plt.rcParams['axes.unicode_minus'] = Falsedef get_activation(feat, backbone_idx=-1):def hook(model, inputs, outputs):if backbone_idx != -1:for _ in range(5 - len(outputs)):outputs.insert(0, None)feat.append(outputs[backbone_idx])else:feat.append(outputs)return hookdef letterbox(im, new_shape=(640, 640), color=(114, 114, 114), auto=True,scaleFill=False, scaleup=True, stride=32):shape = im.shape[:2]if isinstance(new_shape, int):new_shape = (new_shape, new_shape)r = min(new_shape[0] / shape[0], new_shape[1] / shape[1])if not scaleup:r = min(r, 1.0)new_unpad = int(round(shape[1] * r)), int(round(shape[0] * r))dw, dh = new_shape[1] - new_unpad[0], new_shape[0] - new_unpad[1]if auto:dw, dh = np.mod(dw, stride), np.mod(dh, stride)elif scaleFill:dw, dh = 0.0, 0.0new_unpad = (new_shape[1], new_shape[0])r = new_shape[1] / shape[1], new_shape[0] / shape[0]dw /= 2dh /= 2if shape[::-1] != new_unpad:im = cv2.resize(im, new_unpad, interpolation=cv2.INTER_LINEAR)top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))left, right = int(round(dw - 0.1)), int(round(dw + 0.1))im = cv2.copyMakeBorder(im, top, bottom, left, right,cv2.BORDER_CONSTANT, value=color)return im, r, (dw, dh)def get_rectangle(data, thresh):h, w = data.shapeall_sum = np.sum(data)for i in range(1, h // 2):selected_area = data[h // 2 - i:h // 2 + 1 + i, w // 2 - i:w // 2 + 1 + i]area_sum = np.sum(selected_area)if area_sum / all_sum > thresh:return i * 2 + 1, (i * 2 + 1) / h * (i * 2 + 1) / wreturn Noneclass YOLOERFVisualizer:def __init__(self, weight, device, layers, dataset, num_images, save_path=None):# 初始化交互信息print("=" * 50)print("Starting ERF Visualization Tool for YOLO")print(f"Model weights: {os.path.basename(weight)}")print(f"Device: {device}")print(f"Layers to visualize: {layers}")print(f"Dataset path: {dataset}")print(f"Number of images to process: {num_images}")print(f"Save path: {save_path if save_path else 'Not specified (only display)'}")print("=" * 50 + "\n")self.device = torch.device(device)self.layers = layersself.dataset = datasetself.num_images = num_imagesself.save_path = save_pathself.results = {layer: AverageMeter() for layer in layers}# 加载模型print("Loading model...")self.model = attempt_load_weights(weight, self.device)self.model.info()for p in self.model.parameters():p.requires_grad_(True)self.model.eval()self.optimizer = torch.optim.SGD(self.model.parameters(), lr=0, weight_decay=0)self.optimizer.zero_grad()print("Model loaded successfully!\n")def register_hooks(self, layer):feat = []hooks = []if '-' in str(layer):layer_first, layer_second = layer.split('-')hook = self.model.model[int(layer_first)].register_forward_hook(get_activation(feat, backbone_idx=int(layer_second)))else:hook = self.model.model[int(layer)].register_forward_hook(get_activation(feat))hooks.append(hook)return feat, hooksdef get_input_grad(self, samples, layer):feat, hooks = self.register_hooks(layer)_ = self.model(samples)outputs = feat[-1]# 清理钩子for h in hooks:h.remove()# 计算中心点梯度out_size = outputs.size()central_point = torch.nn.functional.relu(outputs[:, :, out_size[2] // 2, out_size[3] // 2]).sum()grad = torch.autograd.grad(central_point, samples)[0]grad = torch.nn.functional.relu(grad)aggregated = grad.sum((0, 1))return aggregated.cpu().numpy()def process(self):# 获取有效图像路径image_paths = [p for p in os.listdir(self.dataset)if os.path.isfile(os.path.join(self.dataset, p))]valid_images = len(image_paths)if valid_images == 0:print("Error: No images found in the dataset folder!")returnprint(f"Found {valid_images} images in dataset. Processing up to {self.num_images}...\n")# 处理图像pbar = tqdm(total=self.num_images, desc="Processing images", unit="image")processed = 0for img_idx, image_path in enumerate(image_paths):if processed >= self.num_images:breakimg_path = os.path.join(self.dataset, image_path)# 读取图像img = cv2.imread(img_path)if img is None:print(f"Skipping invalid image: {image_path}")continue# 预处理img = letterbox(img, auto=False)[0]img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)img = np.float32(img) / 255.0samples = torch.from_numpy(np.transpose(img, (2, 0, 1))).unsqueeze(0).to(self.device)samples.requires_grad = Trueself.optimizer.zero_grad()# 计算各层贡献for layer in self.layers:try:contribution = self.get_input_grad(samples, layer)if not np.isnan(np.sum(contribution)):self.results[layer].update(contribution)else:print(f"Warning: NaN detected in layer {layer} for image {image_path}")except Exception as e:print(f"Error processing layer {layer} for image {image_path}: {str(e)}")processed += 1pbar.update(1)pbar.close()print(f"\nSuccessfully processed {processed} images!")# 绘制结果self.plot_results()def plot_results(self):print("\nGenerating visualization...")num_layers = len(self.layers)fig, axes = plt.subplots(1, num_layers, figsize=(5 * num_layers, 5), dpi=100)if num_layers == 1:axes = [axes]for idx, (layer, ax) in enumerate(zip(self.layers, axes)):# 处理数据data = self.results[layer].avgif data is None:print(f"Warning: No valid data for layer {layer}")continuedata = np.log10(data + 1)data = data / np.max(data)# 绘制热力图sns.heatmap(data, ax=ax,cmap='jet',# 颜色xticklabels=False, yticklabels=False,cbar=True if idx == num_layers - 1 else False)ax.set_title(f"Layer {layer}", fontsize=12)# 打印高贡献区域比例print(f"\nStatistics for Layer {layer}:")for thresh in [0.2, 0.3, 0.5, 0.99]:side_len, ratio = get_rectangle(data, thresh)print(f"  Threshold {thresh:<4}: Side length = {side_len:<3}, Area ratio = {ratio:.4f}")# 调整布局plt.tight_layout()# 保存或展示if self.save_path:plt.savefig(self.save_path)print(f"\nVisualization saved to: {self.save_path}")plt.show()print("\nVisualization completed!")def get_params():return {'weight': 'yolo11l.pt',  # 模型权重路径'device': 'cuda:0',  # 设备（cuda或cpu）'layers': ['1', '2', '3', '4', '5'],  # 'layers': ['6', '7', '8', '9', '10'],  # 要可视化的层列表# 'layers': ['1', '2', '3', '4', '5', '6', '7', '8', '9', '10'],   # 要可视化的层列表'dataset': 'C:/Users/Virgil/Desktop/SOD_Detection/testA_2000images',  # 图像文件夹'num_images': 100,  # 处理的图像数量'save_path': 'layer_erf.png'  # 保存路径（可选，为None时仅展示）}if __name__ == '__main__':cfg = get_params()visualizer = YOLOERFVisualizer(**cfg)visualizer.process()

使用说明：

在get_params()中设置模型权重，设备要绘制的层的列表，数据集图片文件夹，图片数量，保存图片路径（不传入不保存）

sns.heatmap()的参数cmap='RdYlGn'来修控制渐变颜色。内置颜色映射：

'viridis'（推荐，Matplotlib默认，视觉友好）
'plasma' / 'inferno' / 'magma'（类似热度图）
'coolwarm'（蓝-白-红，适合显示正负值）
'jet'（传统彩虹色，不推荐但常用）
'YlOrRd'（黄-橙-红）
'Blues' / 'Greens' / 'Reds'（单色渐变）

示例输出：

C:\Development\Py\Anaconda\envs\yolo_env\python.exe C:\Users\Virgil\Desktop\yolov11-test\get_model_erf.py 
==================================================
Starting ERF Visualization Tool for YOLO
Model weights: yolo11l.pt
Device: cuda:0
Layers to visualize: ['13', '16', '19', '22']
Dataset path: C:/Users/Virgil/Desktop/SOD_Detection/testA_2000images
Number of images to process: 100
Save path: layer_erf_6-10.png
==================================================Loading model...
YOLO11l summary: 357 layers, 25,372,160 parameters, 0 gradients, 87.6 GFLOPs
Model loaded successfully!Found 2000 images in dataset. Processing up to 100...Processing images: 100%|██████████| 100/100 [00:34<00:00,  2.91image/s]Successfully processed 100 images!Generating visualization...Statistics for Layer 13:Threshold 0.2 : Side length = 97 , Area ratio = 0.0230Threshold 0.3 : Side length = 127, Area ratio = 0.0394Threshold 0.5 : Side length = 189, Area ratio = 0.0872Threshold 0.99: Side length = 599, Area ratio = 0.8760Statistics for Layer 16:Threshold 0.2 : Side length = 91 , Area ratio = 0.0202Threshold 0.3 : Side length = 121, Area ratio = 0.0357Threshold 0.5 : Side length = 179, Area ratio = 0.0782Threshold 0.99: Side length = 593, Area ratio = 0.8585Statistics for Layer 19:Threshold 0.2 : Side length = 125, Area ratio = 0.0381Threshold 0.3 : Side length = 161, Area ratio = 0.0633Threshold 0.5 : Side length = 235, Area ratio = 0.1348Threshold 0.99: Side length = 615, Area ratio = 0.9234Statistics for Layer 22:Threshold 0.2 : Side length = 155, Area ratio = 0.0587Threshold 0.3 : Side length = 199, Area ratio = 0.0967Threshold 0.5 : Side length = 277, Area ratio = 0.1873Threshold 0.99: Side length = 619, Area ratio = 0.9355Visualization saved to: layer_erf_6-10.pngVisualization completed!

输出参数说明

Statistics for Layer 22:Threshold 0.2 : Side length = 155, Area ratio = 0.0587Threshold 0.3 : Side length = 199, Area ratio = 0.0967Threshold 0.5 : Side length = 277, Area ratio = 0.1873Threshold 0.99: Side length = 619, Area ratio = 0.9355

这些输出值是对模型第22层有效感受野（ERF） 的量化分析，具体含义：

Threshold（阈值）：
表示“累计贡献占比”。例如：
- Threshold 0.2 表示“贡献总和占总贡献20%的区域”；
- Threshold 0.99 表示“贡献总和占总贡献99%的区域”。
Side length（边长）：
指能覆盖对应阈值贡献的最小正方形区域的边长（单位：像素）。例如：
- 阈值0.2对应的边长155，说明“输入图像中一个155×155的正方形区域，其贡献总和占该层总贡献的20%”；
- 阈值0.99对应的边长619，说明“需要619×619的正方形区域才能覆盖99%的总贡献”。
Area ratio（面积占比）：
该正方形区域面积与输入图像总面积的比值（百分比）。例如：
- 阈值0.99的面积占比0.9355，说明“619×619的区域占输入图像总面积的93.55%”，即该层几乎需要关注整个图像才能获取99%的有效信息。