当前位置：首页 > news >正文

YOLOv8 从yaml配置文件生成PyTorch模型

news 2025/9/13 5:21:03

YOLOv8 从 yaml 配置文件生成 PyTorch 模型

说明：

首次发表日期：2024-01-21
参考资料：
- https://github.com/ultralytics/ultralytics/tree/fcc4496b127bdafc5b137a686392b07d3461e04f
- https://github.com/ultralytics/ultralytics/blob/fcc4496b127bdafc5b137a686392b07d3461e04f/ultralytics/cfg/models/v8/yolov8-pose.yaml

Overview

以下是 yolov8-pose.yaml 文件：

# Parameters
nc: 1 # number of classes
kpt_shape: [17, 3] # number of keypoints, number of dims (2 for x,y or 3 for x,y,visible)
scales: # model compound scaling constants, i.e. 'model=yolov8n-pose.yaml' will call yolov8-pose.yaml with scale 'n'# [depth, width, max_channels]n: [0.33, 0.25, 1024]s: [0.33, 0.50, 1024]m: [0.67, 0.75, 768]l: [1.00, 1.00, 512]x: [1.00, 1.25, 512]# YOLOv8.0n backbone
backbone:# [from, repeats, module, args]- [-1, 1, Conv, [64, 3, 2]] # 0-P1/2- [-1, 1, Conv, [128, 3, 2]] # 1-P2/4- [-1, 3, C2f, [128, True]]- [-1, 1, Conv, [256, 3, 2]] # 3-P3/8- [-1, 6, C2f, [256, True]]- [-1, 1, Conv, [512, 3, 2]] # 5-P4/16- [-1, 6, C2f, [512, True]]- [-1, 1, Conv, [1024, 3, 2]] # 7-P5/32- [-1, 3, C2f, [1024, True]]- [-1, 1, SPPF, [1024, 5]] # 9# YOLOv8.0n head
head:- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 6], 1, Concat, [1]] # cat backbone P4- [-1, 3, C2f, [512]] # 12- [-1, 1, nn.Upsample, [None, 2, "nearest"]]- [[-1, 4], 1, Concat, [1]] # cat backbone P3- [-1, 3, C2f, [256]] # 15 (P3/8-small)- [-1, 1, Conv, [256, 3, 2]]- [[-1, 12], 1, Concat, [1]] # cat head P4- [-1, 3, C2f, [512]] # 18 (P4/16-medium)- [-1, 1, Conv, [512, 3, 2]]- [[-1, 9], 1, Concat, [1]] # cat head P5- [-1, 3, C2f, [1024]] # 21 (P5/32-large)- [[15, 18, 21], 1, Pose, [nc, kpt_shape]] # Pose(P3, P4, P5)

YOLOv8 根据 YAML 配置文件来生成 PyTorch 模型，具体由 parse_model 函数实现。

具体来说，PoseModel 的父类 DetectionModel 的初始化方法包含以下代码：

self.model, self.save = parse_model(deepcopy(self.yaml), ch=ch, verbose=verbose)

与常见的 PyTorch 模型不同，parse_model 返回的模型并没有提供类似如下的 forward 部分：

def forward(self, x):x = self.flatten(x)logits = self.linear_relu_stack(x)return logits

了解其 forward 部分的逻辑，我们需要查看 BaseModel 的 _predict_once 方法。

nc, kpt_shape, scales

以下为涉及 nc, kpt_shape 和 scales 的部分：

# Parameters
nc: 1 # number of classes
kpt_shape: [17, 3] # number of keypoints, number of dims (2 for x,y or 3 for x,y,visible)
scales: # model compound scaling constants, i.e. 'model=yolov8n-pose.yaml' will call yolov8-pose.yaml with scale 'n'# [depth, width, max_channels]n: [0.33, 0.25, 1024]s: [0.33, 0.50, 1024]m: [0.67, 0.75, 768]l: [1.00, 1.00, 512]x: [1.00, 1.25, 512]

其中：

nc 表示有多少个分类，默认是 1 个，一般为 person。

kpt_shape 表示有多少个关键点（keypoints）和关键点的维度数。

默认为 17 个关键点
默认关键点的维度数为 3。具体说明：2 表示只有横坐标和纵坐标（x,y），3 表示另外还有一个值表示是否可见（visible）。

scales 列出了可选的模型大小。

具体而言，yaml_model_load 函数读取 yaml 配置文件的时候，会调用 guess_model_scale 来获取模型的 scale:

d["scale"] = guess_model_scale(path)

即从模型 yaml 文件名路径上获取 scale，然后从 scales 中选取对应的参数。其中 scales 默认有 n,s,m,l,x 这几个选项。

需要注意的是模型 YAML 配置文件的名称符合 yolov\d+([nslmx]) 的正则表达式。

backbone 和 head

backbone 和 head 的 YAML 配置采用同样的风格，都是长度为 4 的列表，分别为 from, repeats, module 和 args，其中

from 代表输入数据来源（即 forward 方法 x 参数的实参）。其中大多数为 $- 1$ ，表示来自上一层。

repeats 表示模块的重复次数。

module 表示具体的模块

args 表示用于初始化 module 的参数，其中第一个参数代表输出通道。

其中重复次数和输出通道并不是完全固定的，会受到模型复杂度的影响（备注：yolov8n 比 yolov8s 的模型复杂度要小）

`parse_model` 函数中的 n, s, m, l, x 这些不同 scale 的影响

不同的 scale 决定了 3 个不同的参数：

depth, width, max_channels = scales[scale]

`depth` 可能会更新重复次数

首先，摘取 parse_model 函数中和 depth 相关的代码：

for i, (f, n, m, args) in enumerate(d["backbone"] + d["head"]):n = n_ = max(round(n * depth), 1) if n > 1 else n  # depth gain...m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module

可以发现，当配置的 repeats 参数大于 1 时，实际的重复次数会受到 depth 参数的影响

其中 yolov8n 和 yolov8s 的 depth 为 0.33，以下为其原重复次数及对应更新后的重复次数

[{n: max(round(n * 0.33), 1) if n > 1 else n} for n in range(1, 13)]
# [{1: 1}, {2: 1}, {3: 1}, {4: 1}, {5: 2}, {6: 2}, {7: 2}, {8: 3}, {9: 3}, {10: 3}, {11: 4}, {12: 4}]

其中 yolov8m 的 depth 为 0.67，影响如下：

[{n: max(round(n * 0.67), 1) if n > 1 else n} for n in range(1, 13)]
# [{1: 1}, {2: 1}, {3: 2}, {4: 3}, {5: 3}, {6: 4}, {7: 5}, {8: 5}, {9: 6}, {10: 7}, {11: 7}, {12: 8}]

而 yolov8l 和 yolovx 的 depth 为 1.0，不受影响，完全和 yaml 配置文件保持一致

`width` 和 `channels` 影响模块的输出通道数（output channels）

提取 parse_model 关于大部分模块参数调整的代码：

for i, (f, n, m, args) in enumerate(d["backbone"] + d["head"]):  # from, number, module, args...if m in (Classify,Conv,ConvTranspose,...):c1, c2 = ch[f], args[0]if c2 != nc:  # if c2 not equal to number of classes (i.e. for Classify() output)c2 = make_divisible(min(c2, max_channels) * width, 8)args = [c1, c2, *args[1:]]if m in (BottleneckCSP, C1, C2, C2f, C3, C3TR, C3Ghost, C3x, RepC3):args.insert(2, n)  # number of repeatsn = 1...m_ = nn.Sequential(*(m(*args) for _ in range(n))) if n > 1 else m(*args)  # module...ch.append(c2)

可以看出，对于大部分模块而言，输出通道数 c2 参数受到 width 和 max_channels 的影响。

以下列出不同复杂度输出通道变化前和变化后的对照：

yolov8n

[{c2: make_divisible(min(c2, 1024) * 0.25, 8)} for c2 in [64, 128, 256, 512, 1024]]
# [{64: 16}, {128: 32}, {256: 64}, {512: 128}, {1024: 256}]

yolov8s

[{c2: make_divisible(min(c2, 1024) * 0.50, 8)} for c2 in [64, 128, 256, 512, 1024]]
# [{64: 32}, {128: 64}, {256: 128}, {512: 256}, {1024: 512}]

yolov8m

[{c2: make_divisible(min(c2, 768) * 0.75, 8)} for c2 in [64, 128, 256, 512, 1024]]
# [{64: 48}, {128: 96}, {256: 192}, {512: 384}, {1024: 576}]

yolov8l

[{c2: make_divisible(min(c2, 512) * 1.0, 8)} for c2 in [64, 128, 256, 512, 1024]]
# [{64: 64}, {128: 128}, {256: 256}, {512: 512}, {1024: 512}]

yolov8x

[{c2: make_divisible(min(c2, 512) * 1.25, 8)} for c2 in [64, 128, 256, 512, 1024]]
# [{64: 80}, {128: 160}, {256: 320}, {512: 640}, {1024: 640}]

最后，注意 ch.append(c2) 表示 YAML 每个模块的输出通道都有保存。

forward pass

BaseModel 的 _predict_once 方法说明了不同模块之间是如何连接的。

提取其中部分代码：

def _predict_once(self, x, profile=False, visualize=False, embed=None):y, dt, embeddings = [], [], []  # outputsfor m in self.model:if m.f != -1:  # if not from previous layerx = y[m.f] if isinstance(m.f, int) else [x if j == -1 else y[j] for j in m.f]  # from earlier layers...x = m(x)  # runy.append(x if m.i in self.save else None)  # save output...return x