当前位置：首页 > news >正文

sam-vit-base 辅助检测卡车的可拽雨覆完全覆盖

news 2025/7/24 21:02:59

还是续着这个需求做技术预研

DINOv2 + yolov8 + opencv 检测卡车的可拉拽雨覆是否完全覆盖-CSDN博客文章浏览阅读680次，点赞24次，收藏10次。加载 DINOv2 主干网络并冻结参数# 添加分割头，适配 518x518 输入# 从 DINOv2 提取特征并通过分割头生成分割结果作用定义基于 DINOv2 的语义分割模型，使用预训练的 ViT（Vision Transformer）作为主干网络，冻结其参数以减少计算量。添加自定义分割头（segmentation_head），将 37x37 的特征图上采样并生成 3 类的分割结果（背景、车斗、覆盖布）。forward 方法处理输入图像（518x518），输出分割 logits。https://blog.csdn.net/u011564831/article/details/145800227之前使用DINOv2 做分割，存在场景使用不明确的问题

后来找了另一个语义分割模型 https://huggingface.co/facebook/sam-vit-basehttps://huggingface.co/facebook/sam-vit-base

DINOv2 和 SAM相比

方面	DINOv2 的缺点	SAM 的优势
设计目标	非专为分割设计，需额外分割头	原生支持分割，开箱即用
分辨率和细节	特征图分辨率低，细节丢失严重	高分辨率掩码，保留细节
语义理解	无直接语义能力，需监督训练	零样本分割，结合提示可加语义
计算效率	计算成本高，需额外处理	优化推理效率，提示驱动更灵活
提示支持	无提示机制，缺乏交互性	支持多种提示，适应性强

DINOv2：
- 泛化能力强，但需要针对具体任务（如语义分割）进行微调。
- 不依赖提示，但这也意味着它无法像 SAM 那样通过用户交互快速聚焦目标区域。
SAM：
- 支持提示驱动的分割（点、框、文本），可以快速适应不同场景和用户需求。
- 在无提示的情况下，SAM 也能自动生成多个候选掩码，具有更高的灵活性。

使用SAM

使用了甲方提供的一个类似的摄像头位置的图片

先利用官方提供的API 做整图语义分割

下载facebook/sam-vit-base

huggingface-cli download --resume-download facebook/sam-vit-base --local-dir ./sam-vit-base/

运行程序

import torch
from PIL import Image
from transformers import SamModel, SamProcessor, pipeline
import matplotlib.pyplot as plt
import numpy as np

# 检查 GPU 可用性并设置设备
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"使用设备: {device}")

# 加载模型和处理器
model = SamModel.from_pretrained("./sam-vit-base").to(device)
processor = SamProcessor.from_pretrained("./sam-vit-base")

# 加载图像
img_url = "./trunk2.jpg"
raw_image = Image.open(img_url).convert("RGB")

# 定义输入点
input_points = [[[450, 600]]]  # 窗口的 2D 定位点

# 预处理输入并移到指定设备
inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to(device)

# 运行模型推理
with torch.no_grad():
    outputs = model(**inputs)

# 后处理掩码
masks = processor.image_processor.post_process_masks(
    outputs.pred_masks.cpu(),
    inputs["original_sizes"].cpu(),
    inputs["reshaped_input_sizes"].cpu()
)[0]  # [num_masks, height, width]
scores = outputs.iou_scores.cpu()  # [1, num_masks]
print('scores:', scores)

# 选择最佳掩码（基于 IoU 分数）
best_mask_idx = scores.argmax(dim=2).item()  # 选择最高 IoU 的掩码
best_mask = masks[best_mask_idx]  # [height, width]

# 释放 GPU 内存
del outputs
torch.cuda.empty_cache()

# 使用 pipeline 生成掩码
generator = pipeline("mask-generation", model="facebook/sam-vit-base", device=0 if device == "cuda" else -1)
outputs_pipeline = generator(
    img_url,
    points_per_batch=64,
    pred_iou_thresh=0.88,
    stability_score_thresh=0.9
)


# 可视化 pipeline 生成的掩码并保存
plt.figure(figsize=(8, 8))
plt.imshow(np.array(raw_image))
ax = plt.gca()
for mask in outputs_pipeline["masks"]:
    show_mask(np.array(mask), ax=ax, random_color=True)
plt.axis("off")
plt.savefig("./pipeline_masks.png", bbox_inches='tight', pad_inches=0, dpi=300)
plt.close()

输出掩码图片，可以看到车斗上方的空间被识别一块独立的区域

在pipline 全局进行分割的基础上我们需要把注意力集中到卡车所在位置上

输入点的引导作用：
- 输入点 [[450, 600]] 是用户提供的提示，告诉 SAM 模型关注图像中的特定区域（假设是卡车的中心或显著部分）。后续我会继续使用yolov8来识别卡车找到中心位置
- SAM 使用这个点作为种子，生成与该点最相关的分割掩码。
SAM 的分割能力：
- SAM 模型经过大规模训练（SA-1B 数据集），能够根据提示分割任意对象。
- 它通过 Transformer 架构和掩码解码器，捕捉输入点周围的上下文特征，生成高质量的掩码。
- 对于 trunk2.jpg，SAM 可能识别出卡车的轮廓，并生成多个候选掩码（例如整个卡车、车斗、背景）。
IoU 分数的作用：
- SAM 输出的 iou_scores 评估每个掩码与提示点的匹配程度。
- 分数最高的掩码（0.9669）通常是覆盖主要物体的掩码，因为它最符合输入点的预期区域。
- 例如，如果 [450, 600] 在卡车上，最佳掩码会分割出卡车的主要部分，而不是背景或其他次要区域。
后处理调整分辨率：
- post_process_masks 将掩码从模型分辨率（例如 256x256）调整到原始图像大小（1440x2560），保留了分割的细节。
- 这确保掩码与图像像素级对齐，精确覆盖主要物体。
可视化突出主要物体：
- show_mask 使用随机颜色叠加掩码，使分割区域在图像上突出显示。
- 因为 best_mask 已聚焦于主要物体（通过 IoU 选择），最终图片自然集中于卡车。

import torch
from PIL import Image
from transformers import SamModel, SamProcessor, pipeline
import matplotlib.pyplot as plt
import numpy as np
from scipy.ndimage import label  # 用于连通区域分析

# 检查 GPU 可用性并设置设备
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"使用设备: {device}")

# 加载模型和处理器
model = SamModel.from_pretrained("./sam-vit-base").to(device)
processor = SamProcessor.from_pretrained("./sam-vit-base")

# 加载图像
img_url = "./trunk2.jpg"
raw_image = Image.open(img_url).convert("RGB")

# 定义输入点
input_points = [[[450, 600]]]  # 窗口的 2D 定位点

# 预处理输入并移到指定设备
inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to(device)

# 运行模型推理
with torch.no_grad():
    outputs = model(**inputs)

# 后处理掩码
masks = processor.image_processor.post_process_masks(
    outputs.pred_masks.cpu(),
    inputs["original_sizes"].cpu(),
    inputs["reshaped_input_sizes"].cpu()
)[0]  # [num_masks, height, width]
scores = outputs.iou_scores.cpu()  # [1, 1, num_masks]
print('scores:', scores)
print('masks shape:', masks.shape)

# 选择最佳掩码（基于 IoU 分数）
best_mask_idx = scores.argmax(dim=2).item()  # 选择最高 IoU 的掩码索引
best_mask = masks[best_mask_idx]  # [height, width]
print('best_mask shape:', best_mask.shape)

# 分析掩码结构
best_mask_np = best_mask.numpy()  # 转换为 NumPy 数组
if best_mask_np.dtype != bool:
    best_mask_np = best_mask_np > 0.5  # 二值化（如果不是布尔型）

# 计算掩码覆盖的像素数
object_pixels = np.sum(best_mask_np)
total_pixels = best_mask_np.size
print(f"掩码覆盖的像素数: {object_pixels} / {total_pixels} ({object_pixels / total_pixels * 100:.2f}%)")

# 连通区域分析
labeled_array, num_features = label(best_mask_np)
print(f"连通区域数量: {num_features}")
if num_features > 0:
    for i in range(1, num_features + 1):
        area = np.sum(labeled_array == i)
        print(f"区域 {i} 的像素数: {area}")

# 释放 GPU 内存
del outputs
torch.cuda.empty_cache()

# 使用 pipeline 生成掩码
generator = pipeline("mask-generation", model="./sam-vit-base", device=0 if device == "cuda" else -1)
outputs_pipeline = generator(
    img_url,
    points_per_batch=64,
    pred_iou_thresh=0.88,
    stability_score_thresh=0.9
)

# 定义可视化函数
def show_mask(mask, ax, random_color=False):
    if random_color:
        color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
    else:
        color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6])
    
    mask = np.array(mask)
    if mask.ndim == 3:
        mask = mask[0]  # 取第一个通道
    elif mask.ndim != 2:
        raise ValueError(f"掩码维度 {mask.shape} 不符合预期")
    
    h, w = mask.shape
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)

# 可视化 pipeline 生成的掩码并保存
plt.figure(figsize=(8, 8))
plt.imshow(np.array(raw_image))
ax = plt.gca()
for mask in outputs_pipeline["masks"]:
    show_mask(mask, ax=ax, random_color=True)
plt.axis("off")
plt.savefig("./pipeline_masks.png", bbox_inches='tight', pad_inches=0, dpi=300)
plt.close()

# 可视化手动推理的最佳掩码并保存
plt.figure(figsize=(8, 8))
plt.imshow(np.array(raw_image))
ax = plt.gca()
show_mask(best_mask, ax=ax, random_color=True)
plt.axis("off")
plt.title("Manual Inference Best Mask")
plt.savefig("./manual_best_mask.png", bbox_inches='tight', pad_inches=0, dpi=300)
plt.close()

print("图片已保存到当前目录：pipeline_masks.png 和 manual_best_mask.png")

使用设备: cuda
scores: tensor([[[0.8627, 0.7113, 0.7364]]])
masks shape: torch.Size([1, 3, 1440, 2560])
best_mask shape: torch.Size([3, 1440, 2560])
掩码覆盖的像素数: 1505626 / 11059200 (13.61%)
连通区域数量: 16
区域 1 的像素数: 4150
区域 2 的像素数: 3762
区域 3 的像素数: 51
区域 4 的像素数: 39
区域 5 的像素数: 353
区域 6 的像素数: 2605
区域 7 的像素数: 2
区域 8 的像素数: 6
区域 9 的像素数: 222
区域 10 的像素数: 175
区域 11 的像素数: 59
区域 12 的像素数: 1493623
区域 13 的像素数: 43
区域 14 的像素数: 69
区域 15 的像素数: 119
区域 16 的像素数: 348
图片已保存到当前目录：pipeline_masks.png 和 manual_best_mask.png

输出最佳掩码效图片效果

计算遮盖率

代码整体思路

目标：
- 输入图像 trunk2.jpg，检测卡车位置。
- 使用 YOLOv8 推导出车斗和雨布的提示点。
- 通过 SAM 模型生成车斗和雨布的分割掩码。
- 计算车斗掩码与雨布掩码的交集，判断覆盖比例。
- 可视化结果并保存图片
主要步骤：
- 检测卡车：用 YOLOv8 找到卡车的边界框。
- 推导提示点：基于边界框几何位置定义车斗和雨布的中心点。
- 生成掩码：用 SAM 模型分割车斗和雨布区域。
- 计算覆盖：比较两个掩码的重叠程度。
- 可视化：展示分割结果和交集区域。

推导提示点

x1, y1, x2, y2 = truck_boxes[0]
truck_width = x2 - x1
truck_height = y2 - y1

truck_bed_x = x1 + truck_width // 2
truck_bed_y = y1 + int(truck_height * 0.75)
truck_bed_point = [[[truck_bed_x, truck_bed_y]]]
print(f"车斗中心点: {truck_bed_point}")

tarp_x = x1 + truck_width // 2
tarp_y = y1 + int(truck_height * 0.25)
tarp_point = [[[tarp_x, tarp_y]]]
print(f"雨布中心点: {tarp_point}")

需要根据实际摄像头角度做调整，比如下面提示点就出现错误

手动调整了下提示位置

假设只处理第一辆检测到的卡车，提取其边界框 [x1, y1, x2, y2]。计算卡车宽度和高度。

格式化为 SAM 所需的提示点结构 [[[x, y]]]。通过几何假设，将 YOLOv8 的检测结果转化为 SAM 的输入提示点。

选择最佳掩码

truck_bed_scores = outputs_truck_bed.iou_scores.cpu()
tarp_scores = outputs_tarp.iou_scores.cpu()
print('truck bed scores:', truck_bed_scores)
print('masks_truck_bed shape:', masks_truck_bed.shape)
print('tarp scores:', tarp_scores)
print('masks_tarp shape:', masks_tarp.shape)

truck_bed_best_idx = truck_bed_scores.argmax(dim=2).item()
tarp_best_idx = tarp_scores.argmax(dim=2).item()

if truck_bed_best_idx >= masks_truck_bed.shape[0]:
    print(f"警告: truck_bed_best_idx ({truck_bed_best_idx}) 超出掩码数量 ({masks_truck_bed.shape[0]})，使用第一个掩码")
    truck_bed_best_idx = 0
if tarp_best_idx >= masks_tarp.shape[0]:
    print(f"警告: tarp_best_idx ({tarp_best_idx}) 超出掩码数量 ({masks_tarp.shape[0]})，使用第一个掩码")
    tarp_best_idx = 0

truck_bed_mask = masks_truck_bed[truck_bed_best_idx]
tarp_mask = masks_tarp[tarp_best_idx]

获取 IoU 分数并打印掩码形状，分析 SAM 输出。
使用 argmax 选择分数最高的掩码索引。
添加保护逻辑：如果索引超出掩码数量（例如 masks_tarp 只有 1 个掩码），回退到第一个掩码。
提取最佳掩码：truck_bed_mask 和 tarp_mask，形状为 [height, width]。

目的：确保选择最符合提示点的掩码，同时处理掩码数量不一致的情况。

计算覆盖情况

truck_bed_mask_np = truck_bed_mask.numpy() > 0.5
tarp_mask_np = tarp_mask.numpy() > 0.5
intersection = np.logical_and(truck_bed_mask_np, tarp_mask_np)
truck_bed_area = np.sum(truck_bed_mask_np)
intersection_area = np.sum(intersection)

coverage_ratio = intersection_area / truck_bed_area if truck_bed_area > 0 else 0
print(f"车斗面积: {truck_bed_area} 像素")
print(f"交集面积: {intersection_area} 像素")
print(f"覆盖比例: {coverage_ratio * 100:.2f}%")
if coverage_ratio >= 0.95:
    print("车斗上方空间被雨布完全覆盖")
else:
    print("车斗上方空间未被雨布完全覆盖")

将掩码二值化（> 0.5），转换为布尔数组。
计算交集（logical_and）：车斗和雨布重叠的区域。
计算面积：

truck_bed_area：车斗掩码的像素数。
intersection_area：交集的像素数。
计算覆盖比例：intersection_area / truck_bed_area，判断是否接近 100%（阈值 95%）。
目的：量化雨布对车斗的覆盖程度，提供判断依据。

可视化结果

定义 show_mask 函数：将掩码转换为 RGB 图像，叠加到原始图像上。
可视化车斗和雨布掩码（并排放置），保存为 truck_bed_and_tarp_masks.png。
可视化交集区域，保存为 intersection_mask.png。

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.imshow(np.array(raw_image))
ax = plt.gca()
show_mask(truck_bed_mask, ax=ax, random_color=True)
plt.title("Truck Bed Mask")
plt.axis("off")
plt.subplot(1, 2, 2)
plt.imshow(np.array(raw_image))
ax = plt.gca()
show_mask(tarp_mask, ax=ax, random_color=True)
plt.title("Tarp Mask")
plt.axis("off")
plt.savefig("./truck_bed_and_tarp_masks.png", bbox_inches='tight', pad_inches=0, dpi=300)


plt.figure(figsize=(8, 8))
plt.imshow(np.array(raw_image))
ax = plt.gca()
show_mask(intersection, ax=ax, random_color=True)
plt.title("Intersection of Truck Bed and Tarp")
plt.axis("off")
plt.savefig("./intersection_mask.png", bbox_inches='tight', pad_inches=0, dpi=300)

经过思考，多次修改和试验后得到代码

import torch
from PIL import Image
from transformers import SamModel, SamProcessor
from ultralytics import YOLO
import matplotlib.pyplot as plt
import numpy as np

# 检查 GPU 可用性并设置设备
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"使用设备: {device}")

# 加载 YOLOv8 模型
yolo_model = YOLO("./yolov8n.pt")

# 加载 SAM 模型和处理器
sam_model = SamModel.from_pretrained("./sam-vit-base").to(device)
sam_processor = SamProcessor.from_pretrained("./sam-vit-base")

# 加载图像
img_url = "./trunk2.jpg"
raw_image = Image.open(img_url).convert("RGB")

# 使用 YOLOv8 检测卡车
results = yolo_model(raw_image, conf=0.3)
truck_boxes = []

for result in results:
    boxes = result.boxes
    for box in boxes:
        cls = int(box.cls[0])
        if cls == 7:  # truck 的类别 ID 为 7
            x1, y1, x2, y2 = box.xyxy[0].tolist()
            truck_boxes.append((int(x1), int(y1), int(x2), int(y2)))
            print(f"检测到卡车: ({x1}, {y1}, {x2}, {y2})")

if not truck_boxes:
    raise ValueError("图像中未检测到卡车，请检查 yolov8n.pt 或图像内容")

# 假设只处理第一辆检测到的卡车
x1, y1, x2, y2 = truck_boxes[0]
truck_width = x2 - x1
truck_height = y2 - y1

# 调整提示点推导（假设卡车为侧面视图，车头在左，车斗在右）
# truck_bed_point：车斗中心点，取边界框右侧中心
truck_bed_x = x1 + int(truck_width * 0.8)  # 右侧 75% 位置
truck_bed_y = y1 + truck_height * 0.25        # 垂直中心
truck_bed_point = [[[truck_bed_x, truck_bed_y]]]
print(f"车斗中心点: {truck_bed_point}")

# tarp_point：雨布中心点，取车斗上部（基于车斗位置）
tarp_x = x1 + int(truck_width * 0.6)
tarp_y = y1 - truck_height* 0.20       # 上部 25% 位置
tarp_point = [[[tarp_x, tarp_y]]]
print(f"雨布中心点: {tarp_point}")

# 可视化提示点（调试用）
plt.figure(figsize=(8, 8))
plt.imshow(np.array(raw_image))
plt.scatter([truck_bed_x], [truck_bed_y], c='red', label='Truck Bed Point')
plt.scatter([tarp_x], [tarp_y], c='blue', label='Tarp Point')
plt.legend()
plt.axis("off")
plt.savefig("./prompt_points.png", bbox_inches='tight', dpi=300)
plt.close()

# 使用 SAM 模型生成车斗和雨布掩码
inputs_truck_bed = sam_processor(raw_image, input_points=truck_bed_point, return_tensors="pt").to(device)
inputs_tarp = sam_processor(raw_image, input_points=tarp_point, return_tensors="pt").to(device)

with torch.no_grad():
    outputs_truck_bed = sam_model(**inputs_truck_bed)
    outputs_tarp = sam_model(**inputs_tarp)

# 后处理掩码
masks_truck_bed = sam_processor.image_processor.post_process_masks(
    outputs_truck_bed.pred_masks.cpu(),
    inputs_truck_bed["original_sizes"].cpu(),
    inputs_truck_bed["reshaped_input_sizes"].cpu()
)[0]
masks_tarp = sam_processor.image_processor.post_process_masks(
    outputs_tarp.pred_masks.cpu(),
    inputs_tarp["original_sizes"].cpu(),
    inputs_tarp["reshaped_input_sizes"].cpu()
)[0]

# 选择最佳掩码
truck_bed_scores = outputs_truck_bed.iou_scores.cpu()
tarp_scores = outputs_tarp.iou_scores.cpu()
print('truck bed scores:', truck_bed_scores)
print('masks_truck_bed shape:', masks_truck_bed.shape)
print('tarp scores:', tarp_scores)
print('masks_tarp shape:', masks_tarp.shape)

truck_bed_best_idx = truck_bed_scores.argmax(dim=2).item()
tarp_best_idx = tarp_scores.argmax(dim=2).item()

if truck_bed_best_idx >= masks_truck_bed.shape[0]:
    print(f"警告: truck_bed_best_idx ({truck_bed_best_idx}) 超出掩码数量 ({masks_truck_bed.shape[0]})，使用第一个掩码")
    truck_bed_best_idx = 0
if tarp_best_idx >= masks_tarp.shape[0]:
    print(f"警告: tarp_best_idx ({tarp_best_idx}) 超出掩码数量 ({masks_tarp.shape[0]})，使用第一个掩码")
    tarp_best_idx = 0

truck_bed_mask = masks_truck_bed[truck_bed_best_idx]
tarp_mask = masks_tarp[tarp_best_idx]
print('truck_bed_mask shape:', truck_bed_mask.shape)
print('tarp_mask shape:', tarp_mask.shape)

# 释放 GPU 内存
del outputs_truck_bed, outputs_tarp
torch.cuda.empty_cache()

# 定义可视化函数
def show_mask(mask, ax, random_color=False):
    if random_color:
        color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
    else:
        color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6])
    mask = np.array(mask)
    if mask.ndim == 3:
        mask = mask[0]
    elif mask.ndim != 2:
        raise ValueError(f"掩码维度 {mask.shape} 不符合预期")
    h, w = mask.shape
    mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
    ax.imshow(mask_image)

# 可视化车斗和雨布掩码
plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.imshow(np.array(raw_image))
ax = plt.gca()
show_mask(truck_bed_mask, ax=ax, random_color=True)
plt.title("Truck Bed Mask")
plt.axis("off")
plt.subplot(1, 2, 2)
plt.imshow(np.array(raw_image))
ax = plt.gca()
show_mask(tarp_mask, ax=ax, random_color=True)
plt.title("Tarp Mask")
plt.axis("off")
plt.savefig("./truck_bed_and_tarp_masks.png", bbox_inches='tight', pad_inches=0, dpi=300)
plt.close()

# 计算覆盖情况
truck_bed_mask_np = truck_bed_mask.numpy() > 0.5
tarp_mask_np = tarp_mask.numpy() > 0.5
intersection = np.logical_and(truck_bed_mask_np, tarp_mask_np)
truck_bed_area = np.sum(truck_bed_mask_np)
intersection_area = np.sum(intersection)

coverage_ratio = intersection_area / truck_bed_area if truck_bed_area > 0 else 0
print(f"车斗面积: {truck_bed_area} 像素")
print(f"交集面积: {intersection_area} 像素")
print(f"覆盖比例: {coverage_ratio * 100:.2f}%")
if coverage_ratio >= 0.95:
    print("车斗上方空间被雨布完全覆盖")
else:
    print("车斗上方空间未被雨布完全覆盖")

# 可视化交集区域
plt.figure(figsize=(8, 8))
plt.imshow(np.array(raw_image))
ax = plt.gca()
show_mask(intersection, ax=ax, random_color=True)
plt.title("Intersection of Truck Bed and Tarp")
plt.axis("off")
plt.savefig("./intersection_mask.png", bbox_inches='tight', pad_inches=0, dpi=300)
plt.close()

print("图片已保存到当前目录：truck_bed_and_tarp_masks.png 和 intersection_mask.png")

运行后输出

使用设备: cuda

0: 384x640 1 truck, 103.4ms
Speed: 9.4ms preprocess, 103.4ms inference, 437.4ms postprocess per image at shape (1, 3, 384, 640)
检测到卡车: (20.757080078125, 515.2138671875, 1079.369140625, 1426.887451171875)
车斗中心点: [[[761, 241.7]]]
雨布中心点: [[[549, 360.13]]]
truck bed scores: tensor([[[0.7191, 0.8671, 0.7891]]])
masks_truck_bed shape: torch.Size([1, 3, 1440, 2560])
tarp scores: tensor([[[0.8107, 0.9023, 0.8953]]])
masks_tarp shape: torch.Size([1, 3, 1440, 2560])
警告: truck_bed_best_idx (1) 超出掩码数量 (1)，使用第一个掩码
警告: tarp_best_idx (1) 超出掩码数量 (1)，使用第一个掩码
truck_bed_mask shape: torch.Size([3, 1440, 2560])
tarp_mask shape: torch.Size([3, 1440, 2560])
车斗面积: 308711 像素
交集面积: 276499 像素
覆盖比例: 89.57%
车斗上方空间未被雨布完全覆盖

图像掩码

车斗和雨覆的掩码交集

尝试过调整阈值去计算掩码，发现雨覆的掩码计算的不甚理想，还是回归到利用pipeline生成通用掩码，再利用提示点单独提取需要的掩码

pipeline通用掩码+提示点获取车斗和雨覆掩码

pipeline通用掩码

先手动指定提示点

车斗中心点: [[[761, 241.7]]]
雨布中心点: [[[549, 360.13]]]

运行程序

import torch
from PIL import Image
from transformers import SamModel, SamProcessor, pipeline
import matplotlib.pyplot as plt
import numpy as np

# 检查 GPU 可用性并设置设备
device = "cuda" if torch.cuda.is_available() else "cpu"
print(f"使用设备: {device}")

# 加载模型和处理器
model = SamModel.from_pretrained("./sam-vit-base").to(device)
processor = SamProcessor.from_pretrained("./sam-vit-base")

# 加载图像
img_url = "./trunk2.jpg"
raw_image = Image.open(img_url).convert("RGB")

# 定义输入点
input_points = [[[450, 600]]]  # 窗口的 2D 定位点

# 预处理输入并移到指定设备
inputs = processor(raw_image, input_points=input_points, return_tensors="pt").to(device)

# 运行模型推理
with torch.no_grad():
    outputs = model(**inputs)

# 后处理掩码
masks = processor.image_processor.post_process_masks(
    outputs.pred_masks.cpu(),
    inputs["original_sizes"].cpu(),
    inputs["reshaped_input_sizes"].cpu()
)[0]  # [num_masks, height, width]
scores = outputs.iou_scores.cpu()  # [1, 1, num_masks]
print('scores:', scores)
print('masks shape:', masks.shape)

# 选择最佳掩码（基于 IoU 分数）
best_mask_idx = scores.argmax(dim=2).item()
best_mask = masks[best_mask_idx]
print('best_mask shape:', best_mask.shape)

# 释放 GPU 内存
del outputs
torch.cuda.empty_cache()

# 使用 pipeline 生成掩码
generator = pipeline("mask-generation", model="./sam-vit-base", device=0 if device == "cuda" else -1)
outputs_pipeline = generator(
    img_url,
    points_per_batch=64,
    pred_iou_thresh=0.88,
    stability_score_thresh=0.9
)

# 定义提示点
truck_bed_point = (761, 241)  # 车斗提示点
tarp_point = (549, 360)  # 雨布提示点
points = [truck_bed_point, tarp_point]  # 点列表


# 定义可视化函数
def show_mask(mask, ax, points, random_color=False):
    # 将 points 转换为整数坐标列表
    points = [(int(x), int(y)) for x, y in points]

    # 处理掩码维度
    mask = np.array(mask)  # 转换为 NumPy 数组
    if mask.ndim == 3:  # 如果是 [channels, height, width]
        mask = mask[0]  # 取第一个通道
    elif mask.ndim != 2:
        raise ValueError(f"掩码维度 {mask.shape} 不符合预期，应为 [height, width] 或 [channels, height, width]")

    h, w = mask.shape  # 获取掩码的高度和宽度

    # 检查掩码是否包含任一指定点
    contains_points = [(x, y) for x, y in points if 0 <= y < h and 0 <= x < w and mask[y, x]]

    # 如果包含任一点，生成独立图片
    if contains_points:
        if random_color:
            color = np.concatenate([np.random.random(3), np.array([0.6])], axis=0)
        else:
            color = np.array([30 / 255, 144 / 255, 255 / 255, 0.6])

        # 创建新图像，而不是使用传入的 ax
        plt.figure(figsize=(8, 8))
        plt.imshow(np.array(raw_image))  # 假设 raw_image 在全局可用
        new_ax = plt.gca()
        mask_image = mask.reshape(h, w, 1) * color.reshape(1, 1, -1)
        new_ax.imshow(mask_image)
        plt.title(f"Mask containing points: {contains_points}")
        plt.axis("off")

        # 生成唯一文件名
        global mask_counter  # 使用全局计数器
        if 'mask_counter' not in globals():
            mask_counter = 0
        filename = f"./mask_{mask_counter}.png"
        plt.savefig(filename, bbox_inches='tight', pad_inches=0, dpi=300)
        plt.close()
        mask_counter += 1

        print(f"保存掩码到 {filename}，包含点: {contains_points}")


# 可视化 pipeline 生成的掩码并保存
plt.figure(figsize=(8, 8))
plt.imshow(np.array(raw_image))
ax = plt.gca()
for mask in outputs_pipeline["masks"]:
    show_mask(mask, ax=ax, points=points, random_color=True)
plt.axis("off")
plt.savefig("./pipeline_masks.png", bbox_inches='tight', pad_inches=0, dpi=300)
plt.close()
print("图片已保存到当前目录：pipeline_masks.png")

查看生成掩码

车斗上方空间分割掩码

雨覆空间分割掩码

效果比较好了，在这两个掩码基础上计算覆盖率

if truck_bed_mask is not None and tarp_mask is not None:
    # 二值化掩码
    truck_bed_mask_np = truck_bed_mask > 0.5
    tarp_mask_np = tarp_mask > 0.5
    # 计算交集和面积
    intersection = np.logical_and(truck_bed_mask_np, tarp_mask_np)
    truck_bed_area = np.sum(truck_bed_mask_np)
    intersection_area = np.sum(intersection)
    coverage_ratio = intersection_area / truck_bed_area if truck_bed_area > 0 else 0

    print(f"车斗面积: {truck_bed_area} 像素")
    print(f"交集面积: {intersection_area} 像素")
    print(f"雨布覆盖率: {coverage_ratio * 100:.2f}%")

    # 可视化交集
    plt.figure(figsize=(8, 8))
    plt.imshow(np.array(raw_image))
    ax_inter = plt.gca()
    inter_color = np.array([1.0, 0.0, 0.0, 0.6])  # 红色表示交集
    inter_image = intersection.reshape(h, w, 1) * inter_color.reshape(1, 1, -1)
    ax_inter.imshow(inter_image)
    plt.title(f"Intersection (Coverage: {coverage_ratio * 100:.2f}%)")
    plt.axis("off")
    plt.savefig("./intersection_mask.png", bbox_inches='tight', pad_inches=0, dpi=300)
    plt.close()

运行结果，这个结果算是比较靠谱了，毕竟无法考虑透视的实际面积

车斗面积: 178494 像素
交集面积: 123257 像素
雨布覆盖率: 69.05%

车斗和雨覆的交集部分

检测到没有被覆盖的车斗部分

查看全文

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.dtcms.com/a/45009.html 如若内容造成侵权/违法违规/事实不符，请联系邮箱：809451989@qq.com进行投诉反馈，一经查实，立即删除！