花小钱办大事！微调Nova Lite，实现Pro级视觉检测效果-编程实验室

本文介绍了在Amazon Bedrock上对Amazon Nova Lite 1.0进行微调的两个实际应用案例，展示了在专业计算机视觉任务中，如何在保持成本效益的同时显著提升性能。

通过对航拍视角检测和低光照监控场景的系统性评估，本例以最小的训练成本实现了增强的指令遵循能力和更高的检测准确率。

背景介绍

Amazon Nova Lite 1.0作为亚马逊云科技多模态基础模型系列的一部分，在通用视觉任务中提供了卓越的性价比。然而，专业应用场景往往需要增强的指令理解能力和特定领域的优化。

本研究评估了微调技术在两个不同用例中的有效性：

1.航拍视角群组检测：增强指令遵循能力以实现智能边界框分组，减少密集标注的视觉混乱。

2.低光照检测准确率提升：提高夜间监控中的实体识别准确率，降低误报率

案例研究1：航拍视角群组检测

问题定义

在客户项目实施过程中，本研究遇到了航拍视角目标检测的特定业务需求。在某些应用场景下，当系统检测到密集分布的目标物体时，客户更希望智能输出少量大型边界框来标识目标区域或目标群组，而不是输出大量细粒度的小框。这一需求主要基于以下考虑：

提升用户体验：减少视觉干扰，突出关键区域
优化系统性能：降低后续处理的计算复杂度
成本控制：减少输出token数量，降低API调用成本

指令遵循方法

提示词如下:

Your task is to perform object detection on input images, identifying all relevant elements based on the given descriptions of target objects, and output the results in the required JSON format. First, carefully read the following target descriptions(formatted as "TargetCategory: [Description-1,...,Description-n]"):<target_description>{'vehicles': ['viewed from above', 'various shapes and sizes', 'visible on roads or lots', 'includes cars, trucks, buses, motorcycles']}</target_description> if there are many(more than 10) target objects in the image, ONLY OUTPUT FEW BIG bounding boxes to show the areas or group of target objects Output in the required JSON format, example:[{{"label-1": [x_min-1, y_min-1, x_max-1, y_max-1]}}, {{"label-2": [x_min-2, y_min-2, x_max-2, y_max-2]}}, ...]Strictly note:0. In the target description, the TargetCategory typically specifies the object to be detected directly, while the descriptive terms are usually used to expand multi-dimensional features. For determining the detection of a TargetCategory, it is necessary to filter based on the lateral descriptions provided by the descriptive terms.1. The output must be in standard legal JSON format without any explanatory text.2. Please ensure that the output target result is strongly related to the target description of the target to be detected.3. The label must be the TargetCategory in the target description format. This format is mandatory.4. Please ensure that there are no extra commas in JSON

左右滑动查看完整示意

本例在提示词中设计了一条关键控制指令:

Notes:

关键控制指令：

if there are many(more than 10) target objects in the image, ONLY OUTPUT FEW BIG bounding boxes to show the areas or group of target objects

该指令要求模型在检测到超过10个目标物体时自动切换到区域级检测模式，输出覆盖目标群组的大型边界框，而不是逐个标注每个物体。

目标输出行为

基线性能分析

测试结果表明，Amazon Nova Lite 1.0模型未能有效遵循提示词中的控制指令。即使面对包含大量目标物体的图像，模型仍然倾向于输出细粒度的检测结果。

通过对相似图像检测结果的统计分析，本例发现Amazon Nova Lite 1.0平均输出超过60个检测框。这种大规模输出不仅影响推理延迟，还会因生成过多的输出token而导致不必要的成本增加。

为了定量评估Amazon Nova Lite和Amazon Nova Pro在航拍视角检测任务中的性能差异，本例对多个样本进行了系统性统计分析。通过比较两个模型版本在相同场景下的检测输出，本例获得了以下关键数据洞察。

Amazon Nova Lite输出统计分析:

对比Amazon Nova Pro输出结果如下图所示。

为什么Fine-Tuning Nova Lite

本例基线分析揭示了性能与成本之间的权衡：Amazon Nova Pro在指令遵循和复杂语义理解方面表现出色，但成本显著高于Amazon Nova Lite（输入token成本：$0.0008 vs $0.00006）。然而，Amazon Nova Lite 1.0在航拍视角群组检测场景所需的指令理解能力方面存在不足。

假设通过微调Amazon Nova Lite可以解决这一困境，在特定用例中实现Pro级别的指令遵循能力，同时保持Lite的成本效益。

为了验证这一方法，本例对微调后的模型在这一专业检测任务中的性能进行了全面的测试和验证。

Fine-Tuning Nova Lite：验证

Fine-Tuning效果概览

基于基线分析中识别出的性能差距，本例开发了一个定制微调的Amazon Nova Lite模型，以增强航拍视角群组检测的指令遵循能力。

验证结果展示了显著的改进：

使用关键词提示时检测框数量减少92%（平均从91.47降至7.04个）
增强的指令理解能力和输出一致性
以Lite的价格实现Pro级别性能的高性价比解决方案

Notes：

有关详细的微调实施步骤，请参阅本文后续章节。

验证架构设计

为了全面评估微调效果，本例设计了一个多维度测试验证框架：

场景覆盖

场景1：训练集图像验证——评估模型在已见数据上的拟合性能
场景2：泛化能力测试——评估模型在未见数据上的性能

提示词策略对比

每个测试场景采用双重提示词策略：

带关键词提示：包含基于群组的检测控制指令
不带关键词提示：标准目标检测提示词

Model invocation implementation reference:

https://aws.amazon.com/blogs/machine-learning/implementing-on-demand-deployment-with-customized-amazon-nova-models-on-amazon-bedrock

性能评估

通过对微调前后模型输出的初步对比分析，本例观察到：

微调前模型性能：检测目标物体数量普遍较高，存在明显的过度检测现象。
微调后模型性能：检测框数量显著减少，数量分布相对均衡。
整体测试中未出现极端多框输出情况，表明微调策略有效改善了过度检测问题。

Fine-Tuning 效果对比

通过可视化对比，微调后的Custom-Model在检测框数量控制和目标区域识别方面均显示出显著改进。

典型示例：

定量评估

测试配置：

模型对比：Custom-Model（微调模型）vs Amazon Nova-Lite（原始模型）
提示词策略：带关键词提示vs不带关键词提示
测试数据：20张航拍视角图像
重复验证：每个组合进行5轮测试
总计：4种组合×5轮=20次测试，400个样本测试结果

测试结果分析：

1.微调效果显著性验证：

带关键词场景：检测框数量从91.47降至7.04，降幅92%
不带关键词场景：检测框数量从94.17降至47.68，降幅49%

2.提示词敏感度差异：

Custom-Model对提示词高度敏感：带关键词时性能更优（7.04 vs 47.68）
Amazon Nova-Lite提示词响应有限：两种提示词类型下性能相似（91.47 vs 94.17）

下图为对比数据展示对象检测的分布：

1.平均检测框数量对比（左上）

Custom-Model表现优异：在两种提示词类型下均显著低于Amazon Nova-Lite
带关键词效果最佳：Custom-Model配合关键词显示最少检测框（约7个）
Amazon Nova-Lite明显过度检测：两种提示词类型下均超过90个检测框

2.检测框数量分布（右上）

Custom-Model稳定性优异
带关键词：分布集中在低值区间，异常值较少
不带关键词：中位数约为50，但存在显著变异性
Amazon Nova-Lite分布一致：两种提示词类型下分布相似，集中在高值区间

3.每张图像检测框排序（左下）

四条曲线层次分明：验证了模型和提示词的不同效果
Custom-Model+关键词：曲线最平缓，大多数图像检测框数量低于15
Amazon Nova-Lite两条重叠线：表明提示词对原始模型影响极小

4.所有测量值分布（右下）

数据密度可视化：清晰展示每种组合的分布特征
Custom-Model配合关键词：数据点高度集中在低值区域
Amazon Nova-Lite：数据点广泛分布在高值区域，证实了过度检测问题

微调效果总结

1.增强的指令理解能力：Custom-Model在特定场景下显著提升了对提示词指令的理解和执行能力

2.稳定性提升：Custom-Model不仅减少了检测框数量，还提高了结果的一致性

案例研究2：夜间低置信度检测优化

为了进一步验证Amazon Nova Lite微调技术在实际业务场景中的应用效果，本例在一个对准确率要求极高的实体检测和告警业务场景中，进行了降低误报率和提高检测准确率的实验。

该场景重点测试模型在复杂环境下的智能决策能力和可靠性表现，特别是在应用滤镜的夜间低光照条件下。

问题定义

在复杂的监控环境中，光照变化、阴影、反射和植被移动等因素经常被误识别为可疑目标。当系统频繁产生误报时，不仅会增加人工验证工作量，更严重的是会降低操作人员对系统的信任度，可能导致真正的安全威胁被忽视。

当场景复杂且没有明显需要标注的目标实体时，Amazon Nova Lite可能会以较低的准确率标注相对可疑的物体，产生误报。虽然在提示词中指示Amazon Nova Lite检测到的物体必须具有高置信度，并在置信度不足时避免标注，但效果仍然有限，期望通过Amazon Nova Lite微调来解决这一问题。

Fine-Tuning设计

采用Amazon Nova Lite微调策略：数据标注→创建训练任务→模型部署→效果验证

关键提示词片段如下。

HIGH CONFIDENCE REQUIREMENT: Only detect objects with 95%+ certainty. If confidence is insufficient for any detected object, set:"object_count": 0 , "objects": []##Output JSON Example:{{"IdentifiedImage":"top","summary":"Nothing special","tag":"[]", "object_count": 0, "objects": []}}Reason：When no objects from the specified categories(Humans, Vehicles, Pets, Deliveries, License plates, Smoke, Fire) are detected in any of the three camera views

左右滑动查看完整示意

最小训练数据集与效果对比

Notes：

采用人工校准，针对不同的低置信度场景，手动标注8-10张图像即可。

标准Amazon Nova Lite 1.0识别结果：

{'summary': 'A person walks near a building', 'tag': '[Humans]', 'object_count': 1, 'objects': [{'Humans': [170, 610, 199, 700]}]}

左右滑动查看完整示意

（图片来自互联网）

微调后的模型展示了指令遵循能力，正确识别了低置信度场景并遵守了提示词中指定的高置信度要求：

{ 'summary': 'Nothing special', 'tag': '[]', 'object_count': 0, 'objects': []}

左右滑动查看完整示意

关键成果

通过对Amazon Nova Lite的针对性微调，本例在降低低置信度实体识别任务方面取得了显著改进：

False Positive Reduction：微调后的模型能够准确区分高置信度和低置信度场景，有效消除了先前产生误报的不确定检测。
Enhanced Decision Intelligence：模型学会了在未达到置信度阈值时适当选择“无检测”响应，显著提高了整体检测可靠性。
Operational Impact：通过大幅降低误报率，系统获得了更好的实用性，并恢复了操作人员对自动化监控能力的信心。

Fine-Tune Job设置与部署

fine-tuning数据集的质量直接决定模型优化效果，您可参阅Amazon Bedrock微调文档要求。

Amazon Bedrock微调文档要求：

https://docs.aws.amazon.com/nova/latest/userguide/fine-tune-prepare-data-understanding.html

考虑到文档中规定的数据量限制，本例采用了10-50张图像的小样本微调策略，通过精心设计的数据标注流程确保高质量的训练数据。

数据准备与标注策略

利用Amazon Nova Pro作为教师模型为训练数据生成高质量标签。

## Use Pro for data labeling and upload images to S3 for subsequent training data preparationuv run generate_labels_by_llm.py \--num_threads=1 \--upload_to_s3 \--model_id=us.amazon.nova-pro-v1:0

左右滑动查看完整示意

该脚本使用与测试场景相同的提示词，确保标注数据与实际应用场景的一致性。

通过上述流程，本例生成了符合Amazon Bedrock微调要求的JSONL格式标注数据。每个样本的数据结构严格遵循官方规范：

{ "schemaVersion": "bedrock-conversation-2024", "system": [ { "text": "You are a smart assistant that can detect objects in images." } ], "messages": [ { "role": "user", "content": [ { "text": "Your task is to perform object detection on input images, identifying all relevant elements based on the given descriptions of target objects, and output the results in the required JSON format. \nFirst, carefully read the following target descriptions (formatted as \"TargetCategory: [Description-1,...,Description-n]\"):\n<target_description>\n{'vehicles': ['viewed from above', 'various shapes and sizes', 'visible on roads or lots', 'includes cars, trucks, buses, motorcycles']}\n</target_description>\n\nif there are many(more than 10) target objects in the image, ONLY OUTPUT FEW BIG bounding boxes to show the areas or group of target objects\n\nOutput in the required JSON format, example:\n[\n {{\"label-1\": [x_min-1, y_min-1, x_max-1, y_max-1]}},\n {{\"label-2\": [x_min-2, y_min-2, x_max-2, y_max-2]}},\n ...\n]\nStrictly note:\n0. In the target description, the TargetCategory typically specifies the object to be detected directly, while the descriptive terms are usually used to expand multi-dimensional features. For determining the detection of a TargetCategory, it is necessary to filter based on the lateral descriptions provided by the descriptive terms.\n1. The output must be in standard legal JSON format without any explanatory text.\n2. Please ensure that the output target result is strongly related to the target description of the target to be detected.\n3. The label must be the TargetCategory in the target description format. This format is mandatory.\n4. Please ensure that there are no extra commas in JSON" }, { "image": { "format": "jpeg", "source": { "s3Location": { "uri": "s3://aigcdemo.plaza.red/us_amazon_nova-pro-v1_0/20250821_134228/car/0000150_01030_d_0000072.jpeg", "bucketOwner": "390468416359" } } } } ] }, { "role": "assistant", "content": [ { "text": "{\"objects\": [{\"vehicles\": [0, 20, 303, 787]}, {\"vehicles\": [371, 201, 608, 854]}, {\"vehicles\": [536, 756, 631, 853]}, {\"vehicles\": [707, 706, 946, 999]}]}" } ] } ]}

左右滑动查看完整示意

该格式确保了每个训练样本的结构完整性和数据质量可追溯性。

随后，将单个图像的JSONL文件合并为训练任务所需的统一数据集：

## Merge qualified labeled data into a single JSONL file required for subsequent training and verify the existence of corresponding S3 image filesuv run merge_jsonl_files.py --input_dir \../data/processed/us.amazon.nova-pro-v1:0/20250821_134228_car_labeld/original \--check_s3

左右滑动查看完整示意

Training Job创建和配置

基于准备好的JSONL数据集，创建一个Amazon Nova Lite微调训练任务。

## Create training job using JSONL, defaults to fine-tuning based on Nova Lite foundation model## default model_id: arn:aws:bedrock:us-east-1::foundation-model/amazon.nova-lite-v1:0:300kuv run create_nova_ft_job.py \--jsonl ../data/train/train_data_algae_and_cars_20250821.jsonl \--job-name drones-nova-lite-with-all-job \--custom-model-name drones-nova-lite-with-all

左右滑动查看完整示意

训练任务创建后，系统会显示详细的任务配置信息，包括输入/输出路径、超参数设置和其他关键信息。

基于50张图像的训练数据集，预计训练时间为90-300分钟。

Notes:

支持基于Amazon Nova Lite基础模型创建微调任务
不支持对已微调的Amazon Nova Lite模型进行二次微调

训练参数说明如下：

训练完成后，从任务指定的输出目录中下载训练损失数据，以供分析和复核。

Technical Issue Troubleshooting：

Image格式兼容问题

在实施过程中，出现了图像格式兼容性问题：

Invalid input error: train data problematic samples: [8, 9, ...37].Sample 8 - ('messages', 0, 'content', 1, 'image'): Image is not of type JPEG

左右滑动查看完整示意

分析发现，一些标记为JPEG格式的图像实际上是MPO格式，导致训练失败。

MPO（Multi Picture Format）:

JPEG的扩展格式，用于存储多个相关图像
常用于结合左右眼图像的3D照片
需要专用软件或设备才能正确显示

JPEG（Joint Photographic Experts Group）：

广泛使用的单图像压缩格式
有损压缩，具有出色的跨系统兼容性
标准文件扩展名：.jpg或.jpeg

部署Fine-Tuned Model

使用按需部署（Deploy for On-Demand）方式进行部署：

选择fine-tuned model和部署：

Notes：

等待部署完成。部署完成后，您可以通过Playground进行简单验证或通过Converse API调用。预计等待时间约为10分钟，也可能需要等待更长时间。

成本分析与经济性

训练成本：

Amazon Nova Lite微调：$0.002/1000 tokens
小型数据集（10K tokens）：总训练成本约$0.02

存储与推理：

固定存储费用：$1.95/月（所有微调模型）
推理定价：与基础Amazon Nova Lite模型相同

Amazon Bedrock Nova微调提供了卓越的成本效益，能够以最小的财务影响为企业运营带来显著的性能提升。

定价详情：

https://aws.amazon.com/bedrock/pricing

参考文档

在Amazon Bedrock上使用定制化Amazon Nova模型实现按需部署：

https://aws.amazon.com/blogs/machine-learning/implementing-on-demand-deployment-with-customized-amazon-nova-models-on-amazon-bedrock/

在Amazon Bedrock中使用微调和持续预训练通过您自己的数据定制模型：

https://aws.amazon.com/cn/blogs/aws/customize-models-in-amazon-bedrock-with-your-own-data-using-fine-tuning-and-continued-pre-training/

使用Amazon Nova Lite实现高效且经济的视频审核：

https://aws.amazon.com/cn/blogs/china/using-amazon-nova-lite-to-implement-efficient-and-cost-effective-video-moderation/

本篇作者

叶小微

亚马逊云科技解决方案架构师，现从事电商相关和企业数字化转型工作，拥有多年架构设计、研发、项目管理经验。在工作流、微服务、系统集成等方向有丰富的解决实际问题的经验。

王跃

亚马逊云科技解决方案架构师，负责基于亚马逊云科技云服务的解决方案咨询和设计，在系统架构、大数据、网络、应用研发领域有丰富的研发和实践经验。

新用户注册海外区域账户，可获得最高200美元服务抵扣金，覆盖Amazon Bedrock生成式AI相关服务。“免费计划”账户类型，确保零花费，安心试用。

星标不迷路，开发更极速！

关注后记得星标「亚马逊云开发者」

听说，点完下面4个按钮

就不会碰到bug了！

点击阅读原文查看博客！获得更详细内容！

花小钱办大事！微调Nova Lite，实现Pro级视觉检测效果

JAVA重点基础、进阶知识及易错点总结（13）File 类 + 路径操作

5步掌握Mac微信美化技巧：个性化主题定制完全指南

CFSM：嵌入式C语言轻量级状态机设计模式

Z-Image Atelier在网络安全领域的应用：生成对抗样本进行模型鲁棒性测试

Intv_AI_MK11 Anaconda数据科学环境配置：一站式AI研发平台搭建

多重共线性诊断实战：从相关系数矩阵到VIF分析的完整指南