当前位置: 首页 > news >正文

工业异常检测大模型(1)数据集、方法

任务描述:利用VL大模型对工业图像进行异常检测,尤其是在zero-shot和one-shot方面的应用,zero-shot是指只给一张异常图像,让VL大模型判断异常并输出描述和建议,one-shot是给一张正常图像和异常图像,具体技术路线涉及到agent、强化学习、专家知识。

一、数据集

MMAD数据集:整合开源的工业异常数据集,并基于GPT添加了文本标注,生成工业异常检测的文本标签,数据集制作流程非常有参考价值:

第一步:整合工业异常数据集的检测分割标签,生成Vision Prompt和Prior Knowledge。

第二步:输入GPT,得到GPT的文本描述

第三步:基于文本描述改写异常检测的文本标注

具体任务包括是否存在异常、异常类别、异常位置、异常描述、异常分析、工业品分类

数据集格式及内容:

每张图像都有其对应的txt文本描述,真正的测试标签在mmad.json,一张图像的标签如下:

{"DS-MVTec/bottle/image/broken_large/000.png": {"image_path": "image/broken_large/000.png","conversation": [{"Question": "Is there any defect in the object?","Answer": "A","Options": {"A": "Yes.","B": "No."},"type": "Anomaly Detection","annotation": true},{"Question": "There is a defect in the object. What is the type of the defect?","Answer": "C","Options": {"A": "Foggy appearance.","B": "Color fading.","C": "Broken large.","D": "Scratched surface."},"type": "Defect Classification","annotation": true},{"Question": "There is a defect in the object. Where is the defect?","Answer": "D","Options": {"A": "Top right.","B": "Top left.","C": "Center.","D": "Bottom right."},"type": "Defect Localization","annotation": true},{"Question": "There is a defect in the object. What is the appearance of the defect?","Answer": "A","Options": {"A": "A jagged irregular break.","B": "A smooth crack.","C": "A round hole.","D": "A small chip."},"type": "Defect Description","annotation": true},{"Question": "There is a defect in the object. What is the effect of the defect?","Answer": "B","Options": {"A": "It enhances the bottle's aesthetics.","B": "It presents a safety hazard.","C": "It changes the color of the glass.","D": "It has no effect on the bottle's function."},"type": "Defect Analysis","annotation": true}],"mask_path": "rbg_mask/broken_large/000_rbg_mask.png","similar_templates": ["MVTec-AD/bottle/train/good/001.png","MVTec-AD/bottle/train/good/061.png","MVTec-AD/bottle/train/good/199.png","MVTec-AD/bottle/train/good/124.png","MVTec-AD/bottle/train/good/149.png","MVTec-AD/bottle/train/good/147.png","MVTec-AD/bottle/train/good/089.png","MVTec-AD/bottle/train/good/066.png"],"random_templates": ["MVTec-AD/bottle/train/good/004.png","MVTec-AD/bottle/train/good/032.png","MVTec-AD/bottle/train/good/093.png","MVTec-AD/bottle/train/good/095.png","MVTec-AD/bottle/train/good/104.png","MVTec-AD/bottle/train/good/121.png","MVTec-AD/bottle/train/good/177.png","MVTec-AD/bottle/train/good/191.png"]},

二、增强方法

Retrieval-Augmented Generation (RAG)

按MMAD作者的解释,RAG是a method that combines information retrieval and generation to enhance the performance of language models, particularly for tasks requiring external knowledge

用人话来讲,就是提前构筑一个领域知识库,输入待检测图像时,从领域知识库里检索出相关信息,合并成prompt输入大模型,代码实现:

        if args.similar_template:few_shot = text_gt["similar_templates"][:args.few_shot_model]else:few_shot = text_gt["random_templates"][:args.few_shot_model]rel_image_path = os.path.join(args.data_path, image_path)rel_few_shot = [os.path.join(args.data_path, path) for path in few_shot]if args.domain_knowledge:dataset_name = image_path.split("/")[0].replace("DS-MVTec", "MVTec")object_name = image_path.split("/")[1]domain_knowledge = '\n'.join(all_domain_knowledge[dataset_name][object_name].values())else:domain_knowledge = None

Agent

用一个检测模型提前得到一些关于异常的信息,然后结合输入大模型

代码实现:

首先定义agent模型

    if args.agent:model_name += "_Agent-" + args.agent_model + "-with-" + args.agent_notationif args.agent_model == "GT":agent = "GT"elif args.agent_model == "PatchCore":from SoftPatch.call import call_patchcore, build_patchcore, call_ground_truthagent = build_patchcore()elif args.agent_model == "AnomalyCLIP":from AnomalyCLIP.test_one_example import call_anomalyclip, load_model_and_features, anomalyclip_args, \anomalyclip_args_visaanomalyclip_args_visa.checkpoint_path = os.path.join("../AnomalyCLIP", anomalyclip_args_visa.checkpoint_path)agent = load_model_and_features(anomalyclip_args_visa)elif args.agent_model == "AnomalyCLIP_mvtec":from AnomalyCLIP.test_one_example import call_anomalyclip, load_model_and_features, anomalyclip_argsanomalyclip_args.checkpoint_path = os.path.join("../AnomalyCLIP", anomalyclip_args.checkpoint_path)agent = load_model_and_features(anomalyclip_args)

生成prompt时调用

        if self.agent:if args.agent_model == "GT":anomaly_map = call_ground_truth(self.image_path, self.mask_path, notation=self.args.agent_notation, visualize=self.visualization)elif args.agent_model == "PatchCore":anomaly_map = call_patchcore(self.image_path, self.few_shot, self.agent, notation=args.agent_notation, visualize=self.visualization)elif "AnomalyCLIP" in args.agent_model:anomaly_map = call_anomalyclip(self.image_path, self.agent, notation=args.agent_notation, visualize=self.visualization)images = images + [load_image(anomaly_map, max_num=1).to(torch.bfloat16).cuda()]
        if self.agent:if self.args.agent_notation == "bbox":payload = payload + f"Following is the reference image where some suspicious areas are marked by red bounding boxes. Please note that even if the query image does not have any defects, the red boxes still highlight some areas. Therefore, you still need to make a judgement by yourself. <image>"elif self.args.agent_notation == "contour":payload = payload + f"Following is the reference image where some suspicious areas are marked by red contour. Please note that even if the query image does not have any defects, the image still highlight some areas. Therefore, you still need to make a judgement by yourself. <image>"elif self.args.agent_notation == "highlight":payload = payload + f"Following is the reference image where some suspicious areas are highlight. Please note that even if the query image does not have any defects, the image still highlight some areas. Therefore, you still need to make a judgement by yourself. <image>"elif self.args.agent_notation == "mask":payload = payload + f"Following is the mask of some suspicious areas. But you still need to make a judgement by yourself. <image>"

CoT

思维链方法论文里没提,但是代码里有实现

        if self.CoT:chain_of_thought = []chain_of_thought.append({# "type": "text","text": f"Before answering the following questions, we need to understand the query image. \n"f"First, identify the objects in the template image and imagine the possible types of defects. \n"f"Second, simply describe the differences between the query image with the template sample. \n"f"Third, if exist, judge whether those differences are defect or normal. \n"f"Output a simple thoughts in 100 words. Notice that slight movement of objects or changes in image quality fall within the normal range.\n"# f"First, summary the normal and defect samples and answer why defect sample abnormal. \n"# f"Second, simply describe the differences between the test image with the normal and defect sample. \n"# f"Third, judge whether those differences are defect or normal. \n"},)

三、实验结果

模型效果对比

one-shot的效果增强

RAG的效果增强,RAG的增强效果非常明显,但是异常描述有部分效果退化,作者未解释

agent的效果增强,agent反而没有什么效果,但用GT的效果很好,说明是模型不够准确

四、代码复现

jam-cc/MMAD: The Codes and Data of A Comprehensive Benchmark for Multimodal Large Language Models in Industrial Anomaly Detection [ICLR'25]

作者的开源还是非常不错的,我使用cuda12.4的windows系统复现,记录一些常见的问题:

1、HF模型下载不下来

用魔搭把模型下载到本地

from modelscope import snapshot_download
model_dir = snapshot_download('Qwen/Qwen2.5-VL-3B-Instruct',cache_dir='./model/')

然后把模型初始化改为初始本地文件

    model = Qwen2_5_VLForConditionalGeneration.from_pretrained(str(model_path),torch_dtype="auto",attn_implementation="flash_attention_2",device_map="auto",local_files_only=True,)

2、FlashAttention安装

不推荐自己编译或者pip安装

windows系统可以在下面链接直接下载wheel,pip install即可

Releases · kingbri1/flash-attention

linux系统的:

bgithub.xyz/Dao-AILab/flash-attention/releases/tag/v2.8.3

http://www.dtcms.com/a/349941.html

相关文章:

  • 【git使用场景】本地仓库与远程仓库存在独立历史
  • Vulkan 学习路线图
  • Git 怎么仓库迁移?error: remote origin already exists.怎么解决
  • 定时器的原理
  • TensorFlow 深度学习 | Dataset API 数据读取详解
  • Open3D入门指南:3D数据处理与可视化利器
  • 初识神经网络——《深度学习入门:基于Python的理论与实现》
  • 昆仑万维开源 Matrix-3D大模型,正在开启“造物主”模式
  • 【智慧城市】2025年中国地质大学(武汉)暑期实训优秀作品(2):智慧城市西安与一带一路
  • pytest 并发执行用例(基于受限的测试资源)
  • imx6ull-驱动开发篇40——Linux RTC 驱动简介
  • 一道MySQL笔试题: 输出 100 以内质数
  • VIVO/OPPO手机,显示5G开关
  • 【SystemUI】锁屏来通知默认亮屏Wake模式
  • Mac 菜单栏多合一工具自荐:FancyTool
  • LeetCode算法日记 - Day 22: 提莫攻击、Z字形变换
  • 电影感人文街拍摆摊纪实摄影后期Lr调色教程,手机滤镜PS+Lightroom预设下载!
  • 从手术室到街头摄像头:多模态融合如何让AI“看得懂”万物?
  • 搭建ftp服务器(主动模式,被动模式)
  • Canvas 动态高度文本图片生成器
  • Linux 详谈Ext系列⽂件系统(一)
  • 嵌入式(ARM方向)面试常见问题及解答
  • 【ARM】MDK在debug模式下断点的类型
  • blazor 学习笔记--vscode debug
  • C++11(Linux/GCC)字节序工具
  • 2025年09月计算机二级Python选择题每日一练——第七期
  • 栈指针(Stack Pointer)是什么?
  • 设置密钥连接服务器
  • 【基础-单选】向服务器提交表单数据,以下哪种请求方式比较合适
  • Linux 离线安装lrzsz(rz、sz上传下载小插件)