保姆级教程：用DETR模型批量预测图片并自动生成YOLO格式的txt标注文件（附完整代码）-编程实验室

工业级DETR模型预测结果转YOLO标注全流程实战指南

当我们将DETR模型投入实际生产环境时，常常会遇到一个关键问题：如何让这个基于Transformer的检测器与其他主流框架无缝协作？特别是在需要将预测结果用于模型对比、数据增强或多模型集成时，标注格式的统一化处理就成为必须解决的工程难题。本文将完整呈现从DETR原始输出到YOLO标准格式的转化全流程，涵盖坐标系统转换、置信度处理、批量文件生成等核心环节，并提供可直接集成到生产管道的Python实现方案。

1. 核心问题与解决方案设计

DETR与YOLO在目标检测结果的表示上存在三个本质差异：

坐标系统：DETR使用归一化的中心坐标(cx,cy)和宽高(w,h)，而YOLO需要归一化的左上右下坐标(x_min,y_min,x_max,y_max)
置信度处理：DETR输出的是类别概率分布，YOLO格式通常需要保留top-1类别及其置信度
文件结构：YOLO要求每个图像对应一个同名txt文件，每行表示一个检测对象的规范数据

解决方案的技术路线如下图所示（代码实现将在第3章展开）：

DETR原始输出 → 坐标转换 → 置信度过滤 → 归一化处理 → 文件批量生成

2. 关键算法原理与实现细节

2.1 坐标系统转换算法

DETR的预测框表示为(cx,cy,w,h)，其中所有值都在[0,1]范围内。转换为YOLO格式需要两个步骤：

中心坐标转边界坐标：

def cxcywh_to_xyxy(bbox): x_min = bbox[0] - bbox[2]/2 y_min = bbox[1] - bbox[3]/2 x_max = bbox[0] + bbox[2]/2 y_max = bbox[1] + bbox[3]/2 return [x_min, y_min, x_max, y_max]

坐标值反归一化（根据原始图像尺寸）：

def denormalize(coords, width, height): return [ coords[0] * width, # x_min coords[1] * height, # y_min coords[2] * width, # x_max coords[3] * height # y_max ]

2.2 置信度过滤策略

DETR输出的预测结果通常包含大量低质量检测框，需要根据业务需求设置过滤阈值：

应用场景	推荐阈值	考虑因素
高精度要求	0.7	减少假阳性
召回率优先	0.3	保留更多潜在目标
平衡模式	0.5	精度与召回率的折中

实现代码示例：

def filter_predictions(probs, threshold=0.5): max_probs = probs.max(dim=1) keep_indices = max_probs.values > threshold return probs[keep_indices], max_probs.indices[keep_indices]

3. 完整工程实现

3.1 核心转换函数

import os import torch def detr_to_yolo(detr_output, img_size, threshold=0.5): """ 将DETR输出转换为YOLO格式标注 参数: detr_output: dict 包含pred_logits和pred_boxes的模型输出 img_size: tuple (width, height) 原始图像尺寸 threshold: float 置信度阈值 返回: list 每个元素为[class_id, x_center, y_center, width, height, confidence] """ # 解构模型输出 logits = detr_output['pred_logits'].softmax(-1)[0, :, :-1] boxes = detr_output['pred_boxes'][0] # 过滤低置信度预测 confidences, class_ids = logits.max(-1) keep = confidences > threshold # 坐标转换 yolo_annotations = [] for box, cls_id, conf in zip(boxes[keep], class_ids[keep], confidences[keep]): # DETR的box格式为[cx,cy,w,h]且已归一化 cx, cy, w, h = box.tolist() # 转换为YOLO格式的归一化坐标 x_center = cx y_center = cy width = w height = h yolo_annotations.append([ int(cls_id), x_center, y_center, width, height, float(conf) ]) return yolo_annotations

3.2 批量处理与文件保存

def save_yolo_annotation(annotation, image_path, save_dir): """ 将YOLO格式标注保存到txt文件 参数: annotation: list YOLO格式标注数据 image_path: str 原始图像路径 save_dir: str 标注文件保存目录 """ # 创建保存目录(如果不存在) os.makedirs(save_dir, exist_ok=True) # 生成标注文件名(与图像同名) base_name = os.path.splitext(os.path.basename(image_path))[0] txt_path = os.path.join(save_dir, f"{base_name}.txt") # 写入文件(YOLO格式每行: class_id x_center y_center width height) with open(txt_path, 'w') as f: for item in annotation: line = f"{item[0]} {item[1]:.6f} {item[2]:.6f} {item[3]:.6f} {item[4]:.6f}\n" f.write(line)

3.3 完整处理流程集成

def process_folder(model, img_folder, output_dir, transform, device='cuda', threshold=0.5): """ 批量处理文件夹中的所有图像 参数: model: 加载好的DETR模型 img_folder: str 图像文件夹路径 output_dir: str 标注输出目录 transform: 图像预处理变换 device: str 计算设备 threshold: float 置信度阈值 """ model.eval() model.to(device) for img_name in os.listdir(img_folder): img_path = os.path.join(img_folder, img_name) # 加载并预处理图像 img = Image.open(img_path).convert('RGB') img_tensor = transform(img).unsqueeze(0).to(device) # 模型预测 with torch.no_grad(): outputs = model(img_tensor) # 转换标注格式 img_size = img.size # (width, height) yolo_annos = detr_to_yolo(outputs, img_size, threshold) # 保存结果 save_yolo_annotation(yolo_annos, img_path, output_dir)

4. 生产环境优化建议

4.1 性能优化技巧

批量处理加速：修改模型前向传播以支持批量输入

# 修改后的detect函数支持批量处理 def batch_detect(images, model, transform, batch_size=8): # 构建批量tensor batch = torch.stack([transform(img) for img in images[:batch_size]]) # 批量预测 with torch.no_grad(): outputs = model(batch.to(device)) return outputs

异步IO操作：使用Python的asyncio提高文件写入效率

import aiofiles async def async_save_annotation(annotation, txt_path): async with aiofiles.open(txt_path, 'w') as f: for item in annotation: line = f"{item[0]} {item[1]:.6f} {item[2]:.6f} {item[3]:.6f} {item[4]:.6f}\n" await f.write(line)

4.2 常见问题排查

坐标越界问题：

现象：转换后的坐标超出[0,1]范围

解决方案：在保存前添加边界检查

def validate_coords(coords): return [ max(0, min(1, coords[0])), # x_min max(0, min(1, coords[1])), # y_min max(0, min(1, coords[2])), # x_max max(0, min(1, coords[3])) # y_max ]

类别ID不匹配：
- 现象：下游模型无法识别转换后的类别
- 解决方案：建立类别映射表
```
CLASS_MAPPING = { 0: 2, # DETR的class 0对应YOLO的class 2 1: 5, # ... }
```

内存溢出处理：

对于超大图像数据集，建议使用生成器逐步处理：

def image_generator(folder): for img_name in os.listdir(folder): yield Image.open(os.path.join(folder, img_name))

5. 进阶应用场景

5.1 半自动标注系统构建

将本方案与人工审核工具结合，可以搭建高效的半自动标注流水线：

DETR预测 → 格式转换 → 人工验证/修正 → 生成最终标注

关键组件实现：

def visualize_for_review(image_path, annotation): """生成带标注框的图像供人工审核""" img = cv2.imread(image_path) for cls_id, xc, yc, w, h, conf in annotation: # 转换为OpenCV可绘制的坐标 x1 = int((xc - w/2) * img.shape[1]) y1 = int((yc - h/2) * img.shape[0]) x2 = int((xc + w/2) * img.shape[1]) y2 = int((yc + h/2) * img.shape[0]) cv2.rectangle(img, (x1,y1), (x2,y2), (0,255,0), 2) cv2.putText(img, f"{cls_id}:{conf:.2f}", (x1,y1-5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0,0,255), 1) return img

5.2 多模型集成方案

当需要结合DETR和其他检测模型的结果时，统一的YOLO格式能够大大简化集成流程：

结果融合策略：
- 加权平均法：对不同模型的预测框进行加权融合
- NMS集成：对多模型结果进行非极大值抑制

实现示例：

def integrate_predictions(yolo_results_list, weights=None, iou_thresh=0.5): """ 集成多个模型的YOLO格式预测结果 参数: yolo_results_list: list 各模型的预测结果列表 weights: list 各模型权重 iou_thresh: float 用于NMS的IOU阈值 返回: list 集成后的预测结果 """ if weights is None: weights = [1.0] * len(yolo_results_list) # 加权融合 weighted_boxes = [] for results, weight in zip(yolo_results_list, weights): for res in results: res[-1] *= weight # 调整置信度 weighted_boxes.append(res) # 应用NMS return non_max_suppression(weighted_boxes, iou_thresh)

这套方案已经在多个工业检测项目中验证了其可靠性，特别是在需要将DETR的预测结果用于训练YOLOv5/v7/v8等模型时，转换后的标注文件可以直接用于darknet格式的数据集准备。实际部署时建议根据具体硬件环境调整批量大小和处理线程数，对于超过100万张图像的大规模处理，可以考虑使用分布式计算框架如PySpark来进一步加速流程。