news 2026/5/9 12:14:33

cann/cannbot-skills尾安全约束

作者头像

张小明

前端开发工程师

1.2k 24
文章封面图
cann/cannbot-skills尾安全约束

Tail-Safety Constraints

【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体,本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills

Read this file when a kernel has tile tails, odd row splits, or partial GM boundaries.

Goal

Keep tail handling correct without corrupting the stable local-tensor shape assumptions used by lowering and simulation.

1. Core rule

Applyvalid_m,valid_n, andvalid_kat GM read and write boundaries. Do not shrink local tensor shapes for every tail tile.

Repository expectation:

  • local buffers remain full tile-sized
  • only the GM boundary slices use the tail sizes

2. Why this rule exists

Stable local tensor shapes make lowering and simulator behavior predictable. If you start shrinking local buffers for tails, it becomes much easier to create shape drift between the intended logical tile and the actual staged buffer.

3. Standard cube -> vec half-row writeback pattern

For many cube -> vec kernels, use the standard half-row split:

  • half_rows = CeilDiv(valid_m, 2)
  • row_begin = GetSubBlockIdx() * half_rows
  • row_end = Min(row_begin + half_rows, valid_m)
  • row_count = row_end - row_begin

This keeps odd-row tails stable across the two vec subblocks.

4. Common symptoms of tail bugs

Tail issues often look like this:

  • aligned shapes pass but odd shapes fail
  • only the last tile is wrong
  • one vec subblock is correct and the other is garbage
  • output shape looks right but the boundary rows or columns are corrupted

When this happens, inspect the GM boundary slices first. Do not start by changing the local buffer shape.

Special case:

  • if the kernel is a normalized online softmax with runningrow_max/row_sum, GM-boundary slicing alone is not enough; invalid score columns must behave like-infbeforerowmax
  • readagent/references/constraints/online-softmax-tail.md

5. Quick checklist

Before accepting tail logic, verify:

  • valid_m,valid_n,valid_kcome from the current tile boundary
  • local buffers still use fullTILE_*shapes
  • GM load boundaries use the valid sizes
  • GM store boundaries use the valid sizes
  • vec half-row split usesCeilDivand clamps withMin
  • at least one odd-size case has been tested
  • for normalized online softmax, verify score-domain invalid columns are masked beforecmax

Files to study

  • agent/example/kernels/a5/basic_cube_vec_mix.py
  • agent/example/kernels/a5/matmul_half_splitn_bias10p2_vf.py
  • agent/example/kernels/a5/matmul_rowwise_norm.py
  • agent/example/kernels/a5/vec_cube_abs_sqrt_matmul.py
  • agent/example/kernels/a5/vec_unaligned_gm_to_ub_pad.py

【免费下载链接】cannbot-skillsCANNBot 是面向 CANN 开发的用于提升开发效率的系列智能体,本仓库为其提供可复用的 Skills 模块。项目地址: https://gitcode.com/cann/cannbot-skills

创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考

版权声明: 本文来自互联网用户投稿,该文观点仅代表作者本人,不代表本站立场。本站仅提供信息存储空间服务,不拥有所有权,不承担相关法律责任。如若内容造成侵权/违法违规/事实不符,请联系邮箱:809451989@qq.com进行投诉反馈,一经查实,立即删除!
网站建设 2026/5/9 12:01:32

CANN社区Roadmap发布管理指南

使用Gitcode Issue发布和管理Roadmap 【免费下载链接】community 本项目是CANN开源社区的核心管理仓库,包含社区的治理章程、治理组织、通用操作指引及流程规范等基础信息 项目地址: https://gitcode.com/cann/community 概述 CANN社区推荐使用Gitcode Issu…

作者头像 李华
网站建设 2026/5/9 11:57:29

QLoRA量化技术在日语技术文档处理中的应用实践

1. 日本技术语料微调中的QLoRA量化实践在日语技术文档处理领域,大型语言模型(LLM)面临着独特的挑战:专业术语密集、句式结构复杂,且高质量标注数据稀缺。我们团队在建筑标准文档问答任务中,发现直接使用通用日语模型(Qwen2.5-7B)的…

作者头像 李华
网站建设 2026/5/9 11:51:30

CANN/cann-recipes-infer:HunyuanVideo昇腾推理适配

在昇腾Atlas A2环境上适配HunyuanVideo模型的推理 【免费下载链接】cann-recipes-infer 本项目针对LLM与多模态模型推理业务中的典型模型、加速算法,提供基于CANN平台的优化样例 项目地址: https://gitcode.com/cann/cann-recipes-infer HunyuanVideo模型是一…

作者头像 李华
网站建设 2026/5/9 11:46:42

CANN算子库GeGluV3算子

aclnnGeGluV3 【免费下载链接】ops-nn 本项目是CANN提供的神经网络类计算算子库,实现网络在NPU上加速计算。 项目地址: https://gitcode.com/cann/ops-nn 📄 查看源码 产品支持情况 产品是否支持Ascend 950PR/Ascend 950DT√Atlas A3 训练系列产…

作者头像 李华
网站建设 2026/5/9 11:44:31

CANN驱动获取设备板ID

dcmi_get_device_board_id 【免费下载链接】driver 本项目是CANN提供的驱动模块,实现基础驱动和资源管理及调度等功能,使能昇腾芯片。 项目地址: https://gitcode.com/cann/driver 函数原型 int dcmi_get_device_board_id(int card_id, int devi…

作者头像 李华
网站建设 2026/5/9 11:44:30

Gemma-4模型在NPU上推理

Gemma-4模型在NPU上推理 【免费下载链接】cann-recipes-infer 本项目针对LLM与多模态模型推理业务中的典型模型、加速算法,提供基于CANN平台的优化样例 项目地址: https://gitcode.com/cann/cann-recipes-infer 概述 Gemma-4-26B-A4B是Google于2026年开源的…

作者头像 李华