cann/cannbot-skills尾安全约束-编程实验室

Tail-Safety Constraints

【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体，本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills

Read this file when a kernel has tile tails, odd row splits, or partial GM boundaries.

Goal

Keep tail handling correct without corrupting the stable local-tensor shape assumptions used by lowering and simulation.

1. Core rule

Applyvalid_m,valid_n, andvalid_kat GM read and write boundaries. Do not shrink local tensor shapes for every tail tile.

Repository expectation:

local buffers remain full tile-sized
only the GM boundary slices use the tail sizes

2. Why this rule exists

Stable local tensor shapes make lowering and simulator behavior predictable. If you start shrinking local buffers for tails, it becomes much easier to create shape drift between the intended logical tile and the actual staged buffer.

3. Standard cube -> vec half-row writeback pattern

For many cube -> vec kernels, use the standard half-row split:

half_rows = CeilDiv(valid_m, 2)
row_begin = GetSubBlockIdx() * half_rows
row_end = Min(row_begin + half_rows, valid_m)
row_count = row_end - row_begin

This keeps odd-row tails stable across the two vec subblocks.

4. Common symptoms of tail bugs

Tail issues often look like this:

aligned shapes pass but odd shapes fail
only the last tile is wrong
one vec subblock is correct and the other is garbage
output shape looks right but the boundary rows or columns are corrupted

When this happens, inspect the GM boundary slices first. Do not start by changing the local buffer shape.

Special case:

if the kernel is a normalized online softmax with runningrow_max/row_sum, GM-boundary slicing alone is not enough; invalid score columns must behave like-infbeforerowmax
readagent/references/constraints/online-softmax-tail.md

5. Quick checklist

Before accepting tail logic, verify:

valid_m,valid_n,valid_kcome from the current tile boundary
local buffers still use fullTILE_*shapes
GM load boundaries use the valid sizes
GM store boundaries use the valid sizes
vec half-row split usesCeilDivand clamps withMin
at least one odd-size case has been tested
for normalized online softmax, verify score-domain invalid columns are masked beforecmax

Files to study

agent/example/kernels/a5/basic_cube_vec_mix.py
agent/example/kernels/a5/matmul_half_splitn_bias10p2_vf.py
agent/example/kernels/a5/matmul_rowwise_norm.py
agent/example/kernels/a5/vec_cube_abs_sqrt_matmul.py
agent/example/kernels/a5/vec_unaligned_gm_to_ub_pad.py

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

CANN社区Roadmap发布管理指南

使用Gitcode Issue发布和管理Roadmap 【免费下载链接】community 本项目是CANN开源社区的核心管理仓库，包含社区的治理章程、治理组织、通用操作指引及流程规范等基础信息项目地址: https://gitcode.com/cann/community 概述 CANN社区推荐使用Gitcode Issu…

李华

QLoRA量化技术在日语技术文档处理中的应用实践

1. 日本技术语料微调中的QLoRA量化实践在日语技术文档处理领域，大型语言模型(LLM)面临着独特的挑战：专业术语密集、句式结构复杂，且高质量标注数据稀缺。我们团队在建筑标准文档问答任务中，发现直接使用通用日语模型(Qwen2.5-7B)的…

李华

CANN/cann-recipes-infer：HunyuanVideo昇腾推理适配

在昇腾Atlas A2环境上适配HunyuanVideo模型的推理【免费下载链接】cann-recipes-infer 本项目针对LLM与多模态模型推理业务中的典型模型、加速算法，提供基于CANN平台的优化样例项目地址: https://gitcode.com/cann/cann-recipes-infer HunyuanVideo模型是一…

李华

CANN算子库GeGluV3算子

aclnnGeGluV3 【免费下载链接】ops-nn 本项目是CANN提供的神经网络类计算算子库，实现网络在NPU上加速计算。项目地址: https://gitcode.com/cann/ops-nn 📄 查看源码产品支持情况产品是否支持Ascend 950PR/Ascend 950DT√Atlas A3 训练系列产…

李华

CANN驱动获取设备板ID

dcmi_get_device_board_id 【免费下载链接】driver 本项目是CANN提供的驱动模块，实现基础驱动和资源管理及调度等功能，使能昇腾芯片。项目地址: https://gitcode.com/cann/driver 函数原型 int dcmi_get_device_board_id(int card_id, int devi…

李华

Gemma-4模型在NPU上推理

Gemma-4模型在NPU上推理【免费下载链接】cann-recipes-infer 本项目针对LLM与多模态模型推理业务中的典型模型、加速算法，提供基于CANN平台的优化样例项目地址: https://gitcode.com/cann/cann-recipes-infer 概述 Gemma-4-26B-A4B是Google于2026年开源的…

李华