CANN/pypto分布式共享内存信号API-编程实验室

pypto.distributed.shmem_signal

【免费下载链接】pyptoPyPTO（发音: pai p-t-o）：Parallel Tensor/Tile Operation编程范式。项目地址: https://gitcode.com/cann/pypto

产品支持情况

产品	是否支持
Atlas A3 推理系列产品	√
Atlas A2 推理系列产品	√

功能说明

根据 offsets 指定的索引位置，将信号值 signal 写入 target_pe 对应的 shared memory tensor 的部分视图，从而通知 target_pe。

函数原型

shmem_signal( src: ShmemTensor, src_pe: Union[int, SymbolicScalar], signal: int, shape: list[int] = None, offsets: list[Union[int, SymbolicScalar]] = None, *, target_pe: Union[int, SymbolicScalar], sig_op: AtomicType = AtomicType.SET, pred: list[Tensor] = None, ) -> Tensor

参数说明

参数名	输入/输出	说明
src	输入	触发信号的 shared memory tensor。
src_pe	输入	shared memory tensor 所属的 pe，0 <= pe < n_pes。支持的数据类型为 int 或 SymbolicScalar 类型。
signal	输入	发送到 src 中的信号值。支持的数据类型为：int类型。
shape	输入	需要写入信号的 shared memory tensor 的视图大小。参数类型为 list[int] 类型。
offsets	输入	需要写入信号的 shared memory tensor 的视图的偏移量。支持 int 或 SymbolicScalar 类型的列表。 offsets 的维度应与 src 的维度一致，且每个维度的偏移量值应小于 src 对应维度的大小。
target_pe	输入	接收信号的 pe。如果 target_pe = -1，则广播信号给所有 pe。支持 int 或 SymbolicScalar 类型的列表。
sig_op	输入	数据传输时应用的原子操作类型。支持的数据类型为: AtomicType.SET，AtomicType.ADD。默认为 AtomicType.SET 类型。
pred	输入	用于控制操作执行的依赖关系张量列表。对数据类型无要求。不支持空 Tensor。

返回值说明

返回一个输出 Tensor，用于表示操作完成的依赖关系。

约束说明

shmem_signal 和 shmem_wait_until 必须配合使用，且设置 TileShape 时，切块大小保持一致。

调用示例

TileShape 设置示例

说明：调用 shmem_signal 前，应通过 set_vec_tile_shapes 设置 TileShape， TileShape 维度应和参数 shape 保持一致。

示例1：参数 shape 为 [m, n]，TileShape设置为 [m1, n1]，则 m1，n1 分别用于切分 m，n 轴。
```
pypto.set_vec_tile_shapes(4, 8)
```

接口调用示例

示例 1：将信号值 2 写入 pe = 1 的 shared memory tensor 的全部视图中，并与该视图原本的值进行累加操作，从而通知 pe = 1。

shmem_tensor = pypto.distributed.create_shmem_tensor(group_name="tp", n_pes=8, dtype=pypto.DT_FP16, shape=[64, 128]) pypto.set_vec_tile_shapes(32, 64) signal_out = pypto.distributed.shmem_signal( src=shmem_tensor, src_pe=1, signal=2, target_pe=1, sig_op=pypto.AtomicType.ADD, pred=predToken, )

示例 2：将信号值 2 写入 pe = 1 的 shared memory tensor 的部分视图中，从而通知 pe = 1。该部分视图的 shape 为 [64, 64]，offset 为 [0, 0]，并与该视图原本的值进行累加操作。

shmem_tensor = pypto.distributed.create_shmem_tensor(group_name="tp", n_pes=8, dtype=pypto.DT_FP16, shape=[64, 128]) pypto.set_vec_tile_shapes(32, 64) signal_out = pypto.distributed.shmem_signal( src=shmem_tensor, src_pe=1, signal=2, shape=[64, 64], offsets=[0, 0], target_pe=1, sig_op=pypto.AtomicType.ADD, pred=predToken, )

示例 3：将信号值 4 写入 pe = 3 的 shared memory tensor 的部分视图中，从而通知 pe = 5。该部分视图的 shape 为 [64, 64]，offset 为 [0, 1]，并覆盖该视图原本的值。

shmem_tensor = pypto.distributed.create_shmem_tensor(group_name="tp", n_pes=8, dtype=pypto.DT_FP16, shape=[64, 128]) pypto.set_vec_tile_shapes(32, 64) signal_out = pypto.distributed.shmem_signal( src=shmem_tensor, src_pe=3, signal=4, shape=[64, 64], offsets=[0, 1], target_pe=5, sig_op=pypto.AtomicType.SET, pred=predToken, )

示例 4：将信号值 4 写入 pe = 3 的 shared memory tensor 的部分视图中，从而通知所有 pe。该部分视图的 shape 为 [64, 64]，offset 为 [0, 1]，并覆盖该视图原本的值。

shmem_tensor = pypto.distributed.create_shmem_tensor(group_name="tp", n_pes=8, dtype=pypto.DT_FP16, shape=[64, 128]) pypto.set_vec_tile_shapes(32, 64) signal_out = pypto.distributed.shmem_signal( src=shmem_tensor, sec_pe=3, signal=4, shape=[64, 64], offsets=[0, 1], target_pe=-1, pred=predToken, )

【免费下载链接】pyptoPyPTO（发音: pai p-t-o）：Parallel Tensor/Tile Operation编程范式。项目地址: https://gitcode.com/cann/pypto

创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

Godot游戏后端开发：Nakama插件实战指南与网络功能实现

1. 项目概述：当游戏后端遇上Godot引擎如果你正在用Godot引擎开发一款需要联网功能的游戏，无论是多人对战、排行榜、还是简单的玩家数据存储，后端服务的选择都是一个绕不开的难题。自己从零搭建一套稳定、可扩展的后端系统，对于独立…

李华

GTA5线上小助手：如何快速掌握这款免费工具的完整使用指南

GTA5线上小助手：如何快速掌握这款免费工具的完整使用指南【免费下载链接】GTA5OnlineTools GTA5线上小助手项目地址: https://gitcode.com/gh_mirrors/gt/GTA5OnlineTools 想要在《侠盗猎车手5》线上模式中获得更轻松的游戏体验吗？GTA5线上小助…

李华

Atom编辑器终极中文汉化指南：3步告别英文困扰，打造专属编程环境

Atom编辑器终极中文汉化指南：3步告别英文困扰，打造专属编程环境【免费下载链接】atom-simplified-chinese-menu Atom 的简体中文汉化扩展,目前最全的汉化包。包含菜单汉化、右键菜单汉化以及设置汉化项目地址: https://gitcode.com/gh_mirrors/at/at…

李华

CANN/hccl AllGatherV接口文档

HcclAllGatherV 【免费下载链接】hccl 集合通信库（Huawei Collective Communication Library，简称HCCL）是基于昇腾AI处理器的高性能集合通信库，为计算集群提供高性能、高可靠的通信方案项目地址: https://gitcode.com/cann/hcc…

李华

DS4Windows终极配置指南：深度优化PS4手柄在Windows平台的性能表现

DS4Windows终极配置指南：深度优化PS4手柄在Windows平台的性能表现【免费下载链接】DS4Windows Like those other ds4tools, but sexier 项目地址: https://gitcode.com/gh_mirrors/ds/DS4Windows DS4Windows作为开源控制器映射工具，通过虚拟驱动…

李华