用PyTorch从零复现UNet：手把手教你搭建医学图像分割的‘U型’骨架（附完整代码）-编程实验室

用PyTorch从零构建UNet：医学图像分割实战指南

第一次看到CT扫描图像中的肿瘤区域被AI准确勾勒出来时，那种精确度让我意识到语义分割技术的巨大潜力。UNet作为医学图像分割领域的里程碑模型，其优雅的U型结构和简洁高效的设计思想，至今仍是许多工业级应用的基石。本文将带你从零开始，用PyTorch实现一个完整的UNet模型，并深入探讨每个模块的设计哲学。

1. 环境准备与数据理解

在开始编码前，我们需要配置合适的开发环境。建议使用Python 3.8+和PyTorch 1.10+版本，这些组合经过验证具有最佳稳定性：

conda create -n unet python=3.8 conda activate unet pip install torch torchvision torchaudio pip install opencv-python matplotlib numpy

医学图像数据通常以DICOM或NIfTI格式存储。以公开的ISBI细胞追踪数据集为例，我们需要特别注意：

图像尺寸统一性：原始数据可能包含不同分辨率的图像
标签编码方式：二分类问题通常使用0/1掩码，多分类则需要one-hot编码
数据标准化：医学图像的像素值范围差异较大，需进行归一化

import numpy as np import torch from torch.utils.data import Dataset class MedicalImageDataset(Dataset): def __init__(self, image_paths, mask_paths, transform=None): self.image_paths = image_paths self.mask_paths = mask_paths self.transform = transform def __getitem__(self, idx): image = load_dicom(self.image_paths[idx]) # 伪代码，实际需替换为DICOM/NIfTI读取逻辑 mask = load_mask(self.mask_paths[idx]) if self.transform: augmented = self.transform(image=image, mask=mask) image, mask = augmented['image'], augmented['mask'] # 转换为PyTorch张量并归一化 image = torch.from_numpy(image).float().unsqueeze(0) / 255.0 mask = torch.from_numpy(mask).long() return image, mask

注意：医学图像处理中，数据增强需要特别谨慎。简单的几何变换如旋转、翻转通常是安全的，但颜色变换可能破坏CT/MRI的物理意义。

2. UNet核心模块实现

2.1 DoubleConv：基础构建块

UNet的核心是双重卷积模块，它通过两次连续的3x3卷积提取特征：

import torch.nn as nn class DoubleConv(nn.Module): """(convolution => [BN] => ReLU) * 2""" def __init__(self, in_channels, out_channels): super().__init__() self.double_conv = nn.Sequential( nn.Conv2d(in_channels, out_channels, kernel_size=3, padding=1), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True), nn.Conv2d(out_channels, out_channels, kernel_size=3, padding=1), nn.BatchNorm2d(out_channels), nn.ReLU(inplace=True) ) def forward(self, x): return self.double_conv(x)

与原始论文不同，这里添加了padding=1以保持特征图尺寸不变。这种设计选择带来几个优势：

避免特征图尺寸过快收缩，保留更多空间信息
简化上采样时的对齐问题
更适应现代GPU的并行计算特性

2.2 下采样与上采样模块

下采样模块通过最大池化降低分辨率，同时增加通道数：

class Down(nn.Module): """Downscaling with maxpool then double conv""" def __init__(self, in_channels, out_channels): super().__init__() self.maxpool_conv = nn.Sequential( nn.MaxPool2d(2), DoubleConv(in_channels, out_channels) ) def forward(self, x): return self.maxpool_conv(x)

上采样模块则更为复杂，需要处理特征融合：

class Up(nn.Module): """Upsampling then double conv""" def __init__(self, in_channels, out_channels, bilinear=True): super().__init__() if bilinear: self.up = nn.Upsample(scale_factor=2, mode='bilinear', align_corners=True) self.conv = DoubleConv(in_channels, out_channels, in_channels // 2) else: self.up = nn.ConvTranspose2d( in_channels, in_channels // 2, kernel_size=2, stride=2) self.conv = DoubleConv(in_channels, out_channels) def forward(self, x1, x2): x1 = self.up(x1) # 计算空间维度差异 diffY = x2.size()[2] - x1.size()[2] diffX = x2.size()[3] - x1.size()[3] # 对称填充 x1 = F.pad(x1, [diffX // 2, diffX - diffX // 2, diffY // 2, diffY - diffY // 2]) x = torch.cat([x2, x1], dim=1) return self.conv(x)

提示：特征融合时，跳跃连接(skip connection)的引入是UNet成功的关键。它帮助解码器恢复在编码过程中丢失的空间细节。

3. 完整UNet架构组装

将各个模块组合成完整的UNet网络：

class UNet(nn.Module): def __init__(self, n_channels, n_classes, bilinear=False): super(UNet, self).__init__() self.n_channels = n_channels self.n_classes = n_classes self.bilinear = bilinear self.inc = DoubleConv(n_channels, 64) self.down1 = Down(64, 128) self.down2 = Down(128, 256) self.down3 = Down(256, 512) self.down4 = Down(512, 512) # 最后一层不增加通道数 self.up1 = Up(1024, 256, bilinear) self.up2 = Up(512, 128, bilinear) self.up3 = Up(256, 64, bilinear) self.up4 = Up(128, 64, bilinear) self.outc = OutConv(64, n_classes) def forward(self, x): x1 = self.inc(x) x2 = self.down1(x1) x3 = self.down2(x2) x4 = self.down3(x3) x5 = self.down4(x4) x = self.up1(x5, x4) x = self.up2(x, x3) x = self.up3(x, x2) x = self.up4(x, x1) logits = self.outc(x) return logits

网络结构中的几个关键设计点：

编码器部分逐渐增加通道数(64→128→256→512)，同时通过池化降低分辨率
瓶颈层保持512通道不变，避免参数量爆炸
解码器部分通过上采样逐步恢复分辨率，同时减少通道数
每个上采样阶段都融合了对应编码器层的特征

4. 训练策略与调优技巧

4.1 损失函数选择

医学图像分割常用的损失函数组合：

def dice_loss(pred, target, smooth=1.): pred = pred.contiguous() target = target.contiguous() intersection = (pred * target).sum(dim=2).sum(dim=2) loss = (1 - ((2. * intersection + smooth) / (pred.sum(dim=2).sum(dim=2) + target.sum(dim=2).sum(dim=2) + smooth))) return loss.mean() class DiceBCELoss(nn.Module): def __init__(self, weight=None, size_average=True): super(DiceBCELoss, self).__init__() def forward(self, inputs, targets, smooth=1): inputs = torch.sigmoid(inputs) # 计算Dice Loss dice = dice_loss(inputs, targets, smooth) # 计算BCE Loss bce = F.binary_cross_entropy(inputs, targets.float(), reduction='mean') return bce + dice

这种组合利用了BCE的稳定性和Dice系数对类别不平衡的鲁棒性。

4.2 训练循环实现

一个完整的训练epoch应包含以下步骤：

def train_epoch(model, device, train_loader, optimizer, criterion, epoch): model.train() pbar = tqdm(train_loader, desc=f'Epoch {epoch}') for images, masks in pbar: images = images.to(device) masks = masks.to(device) # 前向传播 outputs = model(images) loss = criterion(outputs, masks) # 反向传播 optimizer.zero_grad() loss.backward() optimizer.step() # 计算指标 preds = torch.sigmoid(outputs) > 0.5 dice = dice_coeff(preds, masks) pbar.set_postfix({'loss': loss.item(), 'dice': dice.item()})

4.3 常见问题排查

在UNet实现过程中，最常遇到的三个维度问题：

通道数不匹配：确保每个DoubleConv的输出通道与下一层的输入通道一致
空间尺寸不匹配：上采样时特征图尺寸必须能与跳跃连接的特征图拼接
输出尺寸异常：最终输出应与输入图像有相同的空间维度

调试时可以使用这个简单的尺寸检查工具：

def check_dimensions(model, input_size=(1, 1, 256, 256)): x = torch.randn(input_size) print(f"Input shape: {x.shape}") x1 = model.inc(x) print(f"After inc: {x1.shape}") x2 = model.down1(x1) print(f"After down1: {x2.shape}") # 继续打印所有中间层输出...

5. 进阶优化与部署实践

5.1 注意力机制增强

在跳跃连接处添加注意力门(Attention Gate)可以提升小目标分割效果：

class AttentionGate(nn.Module): def __init__(self, F_g, F_l, F_int): super(AttentionGate, self).__init__() self.W_g = nn.Sequential( nn.Conv2d(F_g, F_int, kernel_size=1, stride=1, padding=0), nn.BatchNorm2d(F_int) ) self.W_x = nn.Sequential( nn.Conv2d(F_l, F_int, kernel_size=1, stride=1, padding=0), nn.BatchNorm2d(F_int) ) self.psi = nn.Sequential( nn.Conv2d(F_int, 1, kernel_size=1, stride=1, padding=0), nn.BatchNorm2d(1), nn.Sigmoid() ) self.relu = nn.ReLU(inplace=True) def forward(self, g, x): g1 = self.W_g(g) x1 = self.W_x(x) psi = self.relu(g1 + x1) psi = self.psi(psi) return x * psi

5.2 模型量化与部署

使用TorchScript将模型导出为生产环境可用的格式：

# 量化模型 quantized_model = torch.quantization.quantize_dynamic( model, {nn.Conv2d, nn.ConvTranspose2d}, dtype=torch.qint8) # 转换为TorchScript scripted_model = torch.jit.script(quantized_model) torch.jit.save(scripted_model, "unet_quantized.pt")

部署时考虑以下优化：

使用TensorRT加速推理
实现多尺度测试增强
添加后处理滤波去除小噪声区域

6. 实战：肺结节分割案例

以LUNA16数据集为例，展示完整的处理流程：

数据预处理：
- 从DICOM提取像素数据
- 根据标注生成3D掩码
- 提取包含结节的128x128x128立方体

训练配置：

model = UNet(n_channels=1, n_classes=1).to(device) optimizer = torch.optim.Adam(model.parameters(), lr=1e-4) scheduler = torch.optim.lr_scheduler.ReduceLROnPlateau(optimizer, 'max', patience=2) criterion = DiceBCELoss()

评估指标：

def calculate_metrics(pred, target): pred = (pred > 0.5).float() target = target.float() tp = (pred * target).sum() fp = (pred * (1 - target)).sum() fn = ((1 - pred) * target).sum() precision = tp / (tp + fp + 1e-7) recall = tp / (tp + fn + 1e-7) dice = 2 * tp / (2 * tp + fp + fn + 1e-7) return precision, recall, dice

在实际项目中，我们发现将2D UNet扩展为3D版本能显著提升肺结节分割的体积测量精度，但会大幅增加显存消耗。一个实用的折中方案是使用2.5D方法，即堆叠相邻切片作为多通道输入。