0. 前言
本文介绍了SAMC结构感知多上下文块(Structure-Aware Multi-Context Block),其通过多尺度并行分流策略与通道-空间双注意力协同机制,首次在超声标准平面识别领域实现浅层结构线索与深层语义特征的精准对齐与深度融合,有效破解了传统方法因忽视结构信息而导致的特征判别力不足与边界感知模糊难题。将其作为即插即用模块,轻松助力CNN、YOLO、Transformer等深度学习模型,精准增强多尺度结构感知能力、强化关键区域特征响应,让模型在面对低对比度图像、模糊解剖边界或类间相似性高等挑战性场景时,依然能够保持清晰的结构辨识度与稳定的分类精度。
专栏链接:即插即用系列专栏链接,可点击跳转免费订阅!!!
目录
0. 前言
1. SAMC注意力简介
2. SAMC注意力原理与创新点
🧠 SAMC注意力基本原理
🎯 SAMC注意力创新点
3. 适用范围与模块效果
🍀适用范围
⚡模块效果
4. SAMC模块代码实现
1. SAMC注意力简介
超声标准平面识别对于疾病筛查、器官评估和生物测量等临床任务至关重要。然而,现有方法未能有效利用浅层结构信息,且难以通过图像增强生成的对比样本捕捉细粒度语义差异,最终导致超声标准平面对结构和判别细节的识别效果欠佳。为解决这些问题,本文提出SEMC,一种新颖的结构增强混合专家 对比学习框架,将结构感知特征融合与专家引导对比学习相结合。具体而言,本文首先引入一种新颖的语义-结构融合模块(SSFM),通过有效对齐浅层和深层特征,利用多尺度结构信息增强模型对细粒度结构细节的感知能力。然后,设计了一种新颖的混合专家对比识别模块(MCRM),通过混合专家机制对多层次特征进行分层对比学习和分类,进一步提升类间可分性和识别性能。更重要的是,本文还构建了一个大规模、精细标注的包含六个标准平面的肝脏超声数据集 。在我们内部数据集和两个公共数据集上的大量实验结果表明,SEMC在各个指标上均优于最新的最先进方法。
原始论文:https://arxiv.org/pdf/2511.12559
原始代码:https://github.com/YanGuihao/SEMC
2. SAMC注意力原理与创新点
🧠 SAMC注意力基本原理
SAMC(Structure-Aware Multi-Context Block,结构感知多上下文块)的核心设计理念是“结构优先、多维融合”——通过多尺度并行卷积、双注意力协同机制与跨维度特征聚合的三级递进架构,主动感知并增强图像中的目标结构信息与上下文关联,生成兼具高判别力与低冗余性的优质特征表示。
与传统的单一尺度特征提取模块不同,SAMC针对视觉任务中目标尺度异质性、结构边界模糊、背景干扰复杂等挑战,构建了一套完整的“分流-增强-聚合”处理流程:
(1)多尺度特征分流单元:首先将输入特征通过一组尺寸各异的并行卷积核(如3×3、5×5、7×7)进行同步处理,将特征流拆分为多个分支——小尺度卷积核聚焦细粒度局部细节(如边缘、纹理、微小目标),中等尺度卷积核捕捉区域结构关联,大尺度卷积核覆盖目标整体轮廓与全局上下文。这种设计确保了对不同尺度目标信息的全面捕获,从根本上避免单一感受野带来的信息缺失问题。
(2)通道-空间协同注意力增强单元:在获得多尺度特征后,SAMC引入双重注意力机制进行特征精炼。首先是通道注意力模块,通过对特征图进行全局平均池化和全局最大池化,捕捉各通道的语义重要性,生成通道权重并作用于原始特征,有效强化与目标结构相关的有效通道、抑制冗余背景通道。随后是空间注意力模块,在通道加权后的特征基础上,沿通道维度分别进行均值和最大值聚合,再通过卷积层生成空间权重图,精准定位目标核心区域与结构边界,进一步聚焦关键空间位置、过滤无关背景干扰。
(3)跨维度特征聚合与优化单元:将经过注意力增强的多尺度特征进行拼接融合,通过通道洗牌(Channel Shuffle)操作打破不同通道间的信息壁垒,促进跨尺度、跨通道的特征交互与融合。最后通过逐点卷积对拼接后的特征进行维度压缩与信息整合,消除冗余信息,输出紧凑且高效的结构感知特征图,为下游任务提供兼具结构辨识度、上下文关联性与计算友好性的特征支撑。
🎯 SAMC注意力创新点
多尺度分流感知:通过并行多尺度卷积结构,同时捕捉目标从局部细节到全局形态的全尺度信息,有效解决单一尺度特征提取对目标尺寸变化适应性不足的问题。
双注意力协同机制:创新性地将通道注意力与空间注意力串联协同,先强化语义重要通道、再聚焦关键空间区域,实现“语义导向+空间精修”的双重增强,显著提升对目标边缘与核心区域的感知能力。
跨维度特征融合:通过通道洗牌与逐点卷积的组合设计,在促进跨通道信息交互的同时消除特征冗余,保证增强效果的前提下维持较低的计算开销。
即插即用的轻量化设计:模块输入输出维度保持一致,可直接嵌入YOLOv26的C3k2模块,在不显著增加模型复杂度的前提下实现结构感知能力的全面提升。
3. 适用范围与模块效果
🍀适用范围
SAMC适用于通用视觉领域,特别是需要强化结构感知与多尺度上下文建模的视觉任务,如目标检测、语义分割、医学图像分析等。
为何适用:SAMC的多尺度分流机制使其能够灵活适配不同尺度的目标,从微小病灶到大型器官都能有效覆盖;其双注意力增强机制能够主动过滤复杂背景干扰,强化目标边缘与核心区域的特征信号,在相似目标并存的场景中实现精准区分;跨维度聚合后的紧凑特征在保证判别力的同时控制计算开销,满足实时性需求。特别在超声图像分析这类低对比度、边界模糊、类间相似性高的挑战性场景中,SAMC能够有效提升模型的结构感知能力与判别精度。
⚡模块效果
根据原始论文,SAMC模块的消融实验详见Table 4(Ablation study of the proposed ACE, SAMC, and L_mc on the LP2025 dataset)。该表格展示了在LP2025数据集上,分别移除或添加SAMC子模块时的性能变化。
模块效果:性能SOTA。
总结:从Table 4可以看出,在基线模型(80.26% Accuracy)基础上,仅引入ACE模块后准确率提升至81.38%,在此基础上进一步加入SAMC模块后准确率达到81.51%,表明SAMC模块通过结构感知增强有效提升了模型对解剖结构特征的捕获能力,与ACE模块形成良好的互补效应。
4. SAMC模块代码实现
以下为SAMC模块的官方pytorch实现代码:
import torch import torch.nn as nn from timm.models.layers import trunc_normal_tf_ from timm.models.helpers import named_apply from functools import partial import math def gcd(a, b): while b: a, b = b, a % b return a # Other types of layers can go here (e.g., nn.Linear, etc.) def _init_weights(module, name, scheme=''): if isinstance(module, nn.Conv2d) or isinstance(module, nn.Conv3d): if scheme == 'normal': nn.init.normal_(module.weight, std=.02) if module.bias is not None: nn.init.zeros_(module.bias) elif scheme == 'trunc_normal': trunc_normal_tf_(module.weight, std=.02) if module.bias is not None: nn.init.zeros_(module.bias) elif scheme == 'xavier_normal': nn.init.xavier_normal_(module.weight) if module.bias is not None: nn.init.zeros_(module.bias) elif scheme == 'kaiming_normal': nn.init.kaiming_normal_(module.weight, mode='fan_out', nonlinearity='relu') if module.bias is not None: nn.init.zeros_(module.bias) else: # efficientnet like fan_out = module.kernel_size[0] * module.kernel_size[1] * module.out_channels fan_out //= module.groups nn.init.normal_(module.weight, 0, math.sqrt(2.0 / fan_out)) if module.bias is not None: nn.init.zeros_(module.bias) elif isinstance(module, nn.BatchNorm2d) or isinstance(module, nn.BatchNorm3d): nn.init.constant_(module.weight, 1) nn.init.constant_(module.bias, 0) elif isinstance(module, nn.LayerNorm): nn.init.constant_(module.weight, 1) nn.init.constant_(module.bias, 0) def act_layer(act, inplace=False, neg_slope=0.2, n_prelu=1): # activation layer act = act.lower() if act == 'relu': layer = nn.ReLU(inplace) elif act == 'relu6': layer = nn.ReLU6(inplace) elif act == 'leakyrelu': layer = nn.LeakyReLU(neg_slope, inplace) elif act == 'prelu': layer = nn.PReLU(num_parameters=n_prelu, init=neg_slope) elif act == 'gelu': layer = nn.GELU() elif act == 'hswish': layer = nn.Hardswish(inplace) else: raise NotImplementedError('activation layer [%s] is not found' % act) return layer def channel_shuffle(x, groups): batchsize, num_channels, height, width = x.data.size() channels_per_group = num_channels // groups # reshape x = x.view(batchsize, groups, channels_per_group, height, width) x = torch.transpose(x, 1, 2).contiguous() # flatten x = x.view(batchsize, -1, height, width) return x # Multi-scale depth-wise convolution (MSDC) class MSDC(nn.Module): def __init__(self, in_channels, kernel_sizes, stride, activation='leakyrelu', dw_parallel=True): super(MSDC, self).__init__() self.in_channels = in_channels # 确保 kernel_sizes 是列表 if isinstance(kernel_sizes, int): kernel_sizes = [kernel_sizes] elif isinstance(kernel_sizes, tuple): kernel_sizes = list(kernel_sizes) self.kernel_sizes = kernel_sizes self.activation = activation self.dw_parallel = dw_parallel self.dwconvs = nn.ModuleList([ nn.Sequential( nn.Conv2d(self.in_channels, self.in_channels, kernel_size, stride, kernel_size // 2, groups=self.in_channels, bias=False), nn.BatchNorm2d(self.in_channels), act_layer(self.activation, inplace=True) ) for kernel_size in self.kernel_sizes ]) self.init_weights('normal') def init_weights(self, scheme=''): named_apply(partial(_init_weights, scheme=scheme), self) def forward(self, x): # Apply the convolution layers in a loop outputs = [] for dwconv in self.dwconvs: dw_out = dwconv(x) outputs.append(dw_out) if self.dw_parallel == False: x = x + dw_out # You can return outputs based on what you intend to do with them return outputs class MSCB(nn.Module): """ Multi-Scale Convolution Block (MSCB): Expands channels, applies depthwise convolutions with different kernel sizes (MSDC), and then compresses channels to extract multi-scale features. """ def __init__(self, in_channels, out_channels, stride, kernel_sizes=[1, 3, 5], expansion_factor=2, dw_parallel=True, add=True, activation='leakyrelu'): super(MSCB, self).__init__() self.in_channels = in_channels self.out_channels = out_channels self.stride = stride # 确保 kernel_sizes 是列表 if isinstance(kernel_sizes, int): kernel_sizes = [kernel_sizes] elif isinstance(kernel_sizes, tuple): kernel_sizes = list(kernel_sizes) self.kernel_sizes = kernel_sizes self.expansion_factor = expansion_factor self.dw_parallel = dw_parallel self.add = add self.activation = activation self.n_scales = len(self.kernel_sizes) assert self.stride in [1, 2] self.use_skip_connection = True if self.stride == 1 else False self.ex_channels = int(self.in_channels * self.expansion_factor) self.pconv1 = nn.Sequential( # Pointwise 1x1 nn.Conv2d(self.in_channels, self.ex_channels, 1, 1, 0, bias=False), nn.BatchNorm2d(self.ex_channels), act_layer(self.activation, inplace=True) ) self.msdc = MSDC(self.ex_channels, self.kernel_sizes, self.stride, self.activation, dw_parallel=self.dw_parallel) if self.add == True: self.combined_channels = self.ex_channels * 1 else: self.combined_channels = self.ex_channels * self.n_scales self.pconv2 = nn.Sequential( nn.Conv2d(self.combined_channels, self.out_channels, 1, 1, 0, bias=False), nn.BatchNorm2d(self.out_channels), ) if self.use_skip_connection and (self.in_channels != self.out_channels): self.conv1x1 = nn.Conv2d(self.in_channels, self.out_channels, 1, 1, 0, bias=False) self.init_weights('normal') def init_weights(self, scheme=''): named_apply(partial(_init_weights, scheme=scheme), self) def forward(self, x): pout1 = self.pconv1(x) msdc_outs = self.msdc(pout1) if self.add == True: dout = 0 for dwout in msdc_outs: dout = dout + dwout else: dout = torch.cat(msdc_outs, dim=1) dout = channel_shuffle(dout, gcd(self.combined_channels, self.out_channels)) out = self.pconv2(dout) if self.use_skip_connection: if self.in_channels != self.out_channels: x = self.conv1x1(x) return x + out else: return out # Multi-scale Convolution Block (MSCB) def MSCBLayer(in_channels, out_channels, n=1, stride=1, kernel_sizes=[1, 3, 5], expansion_factor=2, dw_parallel=True, add=True, activation='leakyrelu'): """ Create a sequence of multiple MSCB modules (an MSCB layer). Args: - in_channels: Number of input channels. - out_channels: Number of output channels. - n: Number of stacked MSCB modules. - stride: Stride of the first module (stride=2 can be used for downsampling). - kernel_sizes: List of kernel sizes for multi-scale convolutions, e.g., [1, 3, 5]. - expansion_factor: Channel expansion factor. - dw_parallel: Whether to apply multi-scale depthwise convolutions in parallel (True for parallel, False for sequential with residual connection). - add: Fusion mode for multi-scale results; True for additive fusion, False for channel concatenation. - activation: Type of activation function, e.g., 'relu6'. """ convs = [] mscb = MSCB( in_channels, out_channels, stride, kernel_sizes=kernel_sizes, expansion_factor=expansion_factor, dw_parallel=dw_parallel, add=add, activation=activation ) convs.append(mscb) if n > 1: for i in range(1, n): mscb = MSCB( out_channels, out_channels, 1, kernel_sizes=kernel_sizes, expansion_factor=expansion_factor, dw_parallel=dw_parallel, add=add, activation=activation ) convs.append(mscb) conv = nn.Sequential(*convs) return conv class CAB(nn.Module): def __init__(self, in_channels, out_channels=None, ratio=16, activation='leakyrelu'): super(CAB, self).__init__() self.in_channels = in_channels self.out_channels = out_channels if self.in_channels < ratio: ratio = self.in_channels self.reduced_channels = self.in_channels // ratio if self.out_channels is None: self.out_channels = in_channels self.avg_pool = nn.AdaptiveAvgPool2d(1) self.max_pool = nn.AdaptiveMaxPool2d(1) self.activation = act_layer(activation, inplace=True) self.fc1 = nn.Conv2d(self.in_channels, self.reduced_channels, 1, bias=False) self.fc2 = nn.Conv2d(self.reduced_channels, self.out_channels, 1, bias=False) self.sigmoid = nn.Sigmoid() self.init_weights('normal') def init_weights(self, scheme=''): named_apply(partial(_init_weights, scheme=scheme), self) def forward(self, x): avg_pool_out = self.avg_pool(x) avg_out = self.fc2(self.activation(self.fc1(avg_pool_out))) max_pool_out = self.max_pool(x) max_out = self.fc2(self.activation(self.fc1(max_pool_out))) out = avg_out + max_out return self.sigmoid(out) class SAB(nn.Module): def __init__(self, kernel_size=7): super(SAB, self).__init__() assert kernel_size in (3, 7, 11), 'kernel must be 3 or 7 or 11' padding = kernel_size // 2 self.conv = nn.Conv2d(2, 1, kernel_size, padding=padding, bias=False) self.sigmoid = nn.Sigmoid() self.init_weights('normal') def init_weights(self, scheme=''): named_apply(partial(_init_weights, scheme=scheme), self) def forward(self, x): avg_out = torch.mean(x, dim=1, keepdim=True) max_out, _ = torch.max(x, dim=1, keepdim=True) x = torch.cat([avg_out, max_out], dim=1) x = self.conv(x) return self.sigmoid(x) class SAMC(nn.Module): """ Spatial Attention Multi-scale Convolution (SAMC) Module 支持两种输入格式: 1. 4D tensor: (B, C, H, W) - 标准图像特征输入 2. 3D sequence: (B, N, C) + H, W参数 - 序列化特征输入 与CGTA模块保持一致的接口设计 """ def __init__(self, in_channels, out_channels, kernel_sizes=[1, 3, 5], expansion_factor=2, dw_parallel=True, add=True, activation='leakyrelu', cab_ratio=16): """ Args: in_channels: 输入通道数 out_channels: 输出通道数 kernel_sizes: MSCB中使用的多尺度卷积核大小列表,支持int、list、tuple expansion_factor: MSCB中的通道扩展因子 dw_parallel: 是否并行执行多尺度深度卷积 add: 多尺度结果融合方式,True为相加,False为通道拼接 activation: 激活函数类型 cab_ratio: CAB通道注意力压缩比例 """ super(SAMC, self).__init__() self.in_channels = in_channels self.out_channels = out_channels # 确保 kernel_sizes 是列表 if isinstance(kernel_sizes, int): kernel_sizes = [kernel_sizes] elif isinstance(kernel_sizes, tuple): kernel_sizes = list(kernel_sizes) self.kernel_sizes = kernel_sizes # CAB: Channel Attention Block self.cab = CAB(in_channels, out_channels, ratio=cab_ratio, activation=activation) # SAB: Spatial Attention Block self.sab = SAB() # MSCB: Multi-scale Convolution Block self.mscb = MSCB( in_channels, out_channels, stride=1, kernel_sizes=self.kernel_sizes, expansion_factor=expansion_factor, dw_parallel=dw_parallel, add=add, activation=activation ) self.init_weights('normal') def init_weights(self, scheme=''): named_apply(partial(_init_weights, scheme=scheme), self) def forward(self, x, H=None, W=None): """ 前向传播,支持两种输入格式 Args: x: 输入特征 - 如果H和W为None,则x应为4D tensor (B, C, H, W) - 如果H和W不为None,则x应为3D tensor (B, N, C),其中N=H*W H: 空间高度(当输入为序列时提供) W: 空间宽度(当输入为序列时提供) Returns: 输出特征,格式与输入格式对应: - 输入为4D时返回4D tensor (B, C, H, W) - 输入为3D序列时返回3D tensor (B, N, C) """ # 独立测试模式:输入为序列时,需要提供H和W参数 if H is not None and W is not None: return self.forward_seq(x, H, W) # 默认模式:输入为4D tensor (B, C, H, W) # 确保输入是4D if x.dim() != 4: raise ValueError(f"Expected 4D input (B, C, H, W), got {x.dim()}D tensor. " f"If using sequence input, please provide H and W parameters.") # 标准4D forward # CAB: 通道注意力 cab_out = self.cab(x) x_cab = cab_out * x # SAB: 空间注意力 sab_out = self.sab(x_cab) x_sab = sab_out * x_cab # MSCB: 多尺度卷积 out = self.mscb(x_sab) return out def forward_seq(self, x_seq, H, W): """ 序列输入模式的前向传播 Args: x_seq: 输入序列 (B, N, C),其中 N = H * W H: 空间高度 W: 空间宽度 Returns: 输出序列 (B, N, C_out) """ B, N, C = x_seq.shape # 验证N = H * W if N != H * W: raise ValueError(f"Sequence length N={N} does not match H*W={H*W}") # 将序列重塑为4D tensor: (B, C, H, W) x_4d = x_seq.permute(0, 2, 1).reshape(B, C, H, W).contiguous() # 执行4D forward out_4d = self.forward(x_4d) # 将输出重塑回序列格式: (B, N, C_out) out_seq = out_4d.flatten(2).transpose(1, 2).contiguous() return out_seq if __name__ == "__main__": device = torch.device('cuda:0' if torch.cuda.is_available() else 'cpu') print("=" * 50) print("测试1: 4D tensor输入模式 (B, C, H, W)") print("=" * 50) # 4D tensor输入测试 x_4d = torch.randn(1, 64, 32, 32).to(device) model = SAMC(64, 64).to(device) y_4d = model(x_4d) print("输入特征维度:", x_4d.shape) print("输出特征维度:", y_4d.shape) print() print("=" * 50) print("测试2: 3D序列输入模式 (B, N, C) + H, W") print("=" * 50) # 3D序列输入测试(与CGTA接口一致) B, H, W, C = 1, 32, 32, 64 x_seq = torch.randn(B, H * W, C).to(device) y_seq = model(x_seq, H, W) print("输入序列维度:", x_seq.shape) print("输出序列维度:", y_seq.shape) print() print("=" * 50) print("测试3: 测试 kernel_sizes 参数为 int 类型") print("=" * 50) # 测试 kernel_sizes 为 int 的情况(YOLO配置文件解析时可能的情况) model_int_kernel = SAMC(64, 64, kernel_sizes=3).to(device) y_int_kernel = model_int_kernel(x_4d) print("kernel_sizes=3 测试通过") print(f"输入: {x_4d.shape} -> 输出: {y_int_kernel.shape}") print() print("=" * 50) print("测试4: 测试 kernel_sizes 参数为 tuple 类型") print("=" * 50) # 测试 kernel_sizes 为 tuple 的情况 model_tuple_kernel = SAMC(64, 64, kernel_sizes=(1, 3, 5)).to(device) y_tuple_kernel = model_tuple_kernel(x_4d) print("kernel_sizes=(1, 3, 5) 测试通过") print(f"输入: {x_4d.shape} -> 输出: {y_tuple_kernel.shape}") print() print("=" * 50) print("测试5: 验证两种输入模式输出一致性") print("=" * 50) # 创建相同的输入数据 x_4d_test = torch.randn(2, 64, 32, 32).to(device) model_test = SAMC(64, 64).to(device) model_test.eval() with torch.no_grad(): # 4D模式输出 out_4d = model_test(x_4d_test) # 转换为序列并测试序列模式 x_seq_test = x_4d_test.flatten(2).transpose(1, 2) out_seq = model_test(x_seq_test, 32, 32) out_4d_from_seq = out_seq.transpose(1, 2).reshape(2, 64, 32, 32) # 计算差异 diff = torch.abs(out_4d - out_4d_from_seq).max().item() print(f"两种模式输出的最大差异: {diff:.2e}") print(f"输出是否一致: {diff < 1e-6}")结合自己的思路,可将其即插即用至任何模型,做结构创新设计!
该模块博主已成功嵌入至YOLO26模型中,可订阅博主YOLO系列算法改进或YOLO26自研改进专栏:YOLO系列算法改进专栏链接、YOLO26自研改进系列专栏