将标签格式为xml的数据集按照8:2的比例划分为训练集和验证集
- 把
D:\experiment2\dataset\M3FD_Detection\Ir 里的图片和D:\experiment2\dataset\M3FD_Detection\Annotation 里的同名 xml - 按 8 : 2 随机划分成 train / val
- 目录结构自动建为:
M3FD_split/
├─ train/
│ ├─ images/
│ └─ labels/
└─ val/├─ images/└─ labels/
复制保存为 split_xml_dataset.py 后改路径运行
import os
import random
import shutil
from pathlib import Path# 1. 原始路径 -----------------------------------------------------------
IMG_DIR = Path(r'D:\experiment2\dataset\M3FD_Detection\Ir')
LBL_DIR = Path(r'D:\experiment2\dataset\M3FD_Detection\Annotation')# 2. 划分比例 ------------------------------------------------------------
TRAIN_RATIO = 0.8
VAL_RATIO = 1 - TRAIN_RATIO# 3. 输出根目录 ----------------------------------------------------------
OUT_ROOT = Path(r'D:\experiment2\dataset\M3FD_split')
TRAIN_IMG = OUT_ROOT / 'train' / 'images'
TRAIN_LBL = OUT_ROOT / 'train' / 'labels'
VAL_IMG = OUT_ROOT / 'val' / 'images'
VAL_LBL = OUT_ROOT / 'val' / 'labels'for p in (TRAIN_IMG, TRAIN_LBL, VAL_IMG, VAL_LBL):p.mkdir(parents=True, exist_ok=True)# 4. 收集所有样本 --------------------------------------------------------
images = sorted([f for f in IMG_DIR.iterdir() if f.suffix.lower() in {'.jpg','.jpeg','.png','.bmp'}])
xmls = {f.stem: f for f in LBL_DIR.iterdir() if f.suffix.lower() == '.xml'}# 只保留“图片&xml 成对”的样本
pairs = [(img, xmls[img.stem]) for img in images if img.stem in xmls]
random.shuffle(pairs)# 5. 划分 ---------------------------------------------------------------
train_n = int(len(pairs) * TRAIN_RATIO)
train_pairs = pairs[:train_n]
val_pairs = pairs[train_n:]# 6. 复制文件 -----------------------------------------------------------
def copy_batch(pairs, img_dst, lbl_dst):for img_path, lbl_path in pairs:shutil.copy(img_path, img_dst / img_path.name)shutil.copy(lbl_path, lbl_dst / lbl_path.name)copy_batch(train_pairs, TRAIN_IMG, TRAIN_LBL)
copy_batch(val_pairs, VAL_IMG, VAL_LBL)print(f'Total samples: {len(pairs)}')
print(f'Train: {len(train_pairs)} Val: {len(val_pairs)}')
print('✅ 划分完成!')
运行后终端示例:
Total samples: 2500
Train: 2000 Val: 500
✅ 划分完成!
接下来即可用 M3FD_split/ 作为训练/验证数据根目录送入 YOLOv5、v8 或任何需要 xml 格式检测框的框架。
