Mandelbrot集合的多线程并行计算加速-编程实验室

Mandelbrot集合的多线程并行计算加速

最近研究并行计算，找了CS149的网课看看，顺便做做作业QAQ

代码下载：Mandelbrot集合的多线程并行计算加速代码文件

Mandelbrot集合的多线程并行计算加速
- 1、Mandelbrot 集合介绍
- - 1.1 Mandelbrot 图像的产生过程（计算视角）
  - - 1️⃣ 选定计算区域（复平面上的矩形）
    - 2️⃣ 建立像素网格（离散化）
    - 3️⃣ 像素 → 复数点的映射
    - 4️⃣ 对每个点做 Mandelbrot 迭代
    - 5️⃣ 发散判定（阈值判断）
    - 6️⃣ 最大迭代次数限制（防止无限循环）
    - 7️⃣ 迭代次数 → 像素值
- 2、Mandelbrot图并行计算
- - 2.1、连续块划分（block / chunk）
  - - 特点
    - 代码
  - 2.2、交错划分（cyclic / interleaved）
  - - 示意图
    - 代码
- 2.3、性能对比

1、Mandelbrot 集合介绍

对复数z = x + yi：
z n + 1 = z n 2 + c , z 0 = 0 z_{n+1} = z_n^2 + c,\quad z_0 = 0zn+1=zn2+c,z0=0

如果在不断迭代后|z|超过某个阈值（通常是 2），就认为这个点不属于 Mandelbrot 集。

好，这里我按**“图是怎么一步一步算出来的”来讲，用流程化 + 偏实现视角**的描述，正好对应你代码里在做的事（而不是纯数学定义）。

1.1 Mandelbrot 图像的产生过程（计算视角）

1️⃣ 选定计算区域（复平面上的矩形）

首先，在复平面中选一个有限的矩形区域作为“观察窗口”：

x ∈ [x0, x1] y ∈ [y0, y1]

这个矩形决定了你最终看到的是 Mandelbrot 集的哪一块（缩放、平移本质上就是在改这个矩形）。

2️⃣ 建立像素网格（离散化）

接着，指定输出图像大小：

width × height

这一步的含义是：
把连续的复平面矩形离散成一个规则网格，每个网格点对应一个像素。

由此得到步长：

dx = (x1 - x0) / width dy = (y1 - y0) / height

3️⃣ 像素 → 复数点的映射

对图像中的每一个像素(i, j)：

x = x0 + i * dx y = y0 + j * dy c = x + yi

也就是说：
每个像素都代表复平面中的一个复数 c。

4️⃣ 对每个点做 Mandelbrot 迭代

对当前复数点c，执行如下迭代：

z0 = 0 zn+1 = zn² + c

这是一个局部、互不依赖的计算过程，因此非常适合并行。

5️⃣ 发散判定（阈值判断）

在迭代过程中，每一步都会检查：

|zn| > threshold ?

通常threshold = 2
一旦超过阈值，就认为该点已经发散
记录此时的迭代次数n

6️⃣ 最大迭代次数限制（防止无限循环）

为了避免无限计算，引入最大迭代次数，例如：

maxIterations = 255

规则是：

如果在255 次迭代内发散
→ 返回实际发散的迭代次数
如果迭代 255 次仍未发散
→ 停止迭代，认为该点“属于 Mandelbrot 集内部或边界附近”

7️⃣ 迭代次数 → 像素值

最终：

output[j * width + i] = iteration_count

这个值通常被用于：

直接作为灰度值
或作为调色函数的输入（生成彩色分形图）

2、Mandelbrot图并行计算

2.1、连续块划分（block / chunk）

每个线程拿一整块连续的行

行号: 0 1 2 3 4 5 6 7 8 9 10 11 ─────────────────────────────── 线程0: █ █ █ █ 线程1: █ █ █ █ 线程2: █ █ █ █

等价于：

thread 0 → 行0,1,2,3
thread 1 → 行4,5,6,7
thread 2 → 行8,9,10,11

特点

✅ cache 友好
✅ 实现简单
❌ 如果不同区域算得“慢/快”不均，会负载不平衡

代码

voidworkerThreadStartBlock(WorkerArgs*constargs){inttid=args->threadId;intT=args->numThreads;intH=(int)args->height;introwsPerThread=H/T;intremainder=H%T;intstartRow=tid*rowsPerThread+(tid<remainder?tid:remainder);intnumRows=rowsPerThread+(tid<remainder?1:0);if(numRows<=0)return;mandelbrotSerial(args->x0,args->y0,args->x1,args->y1,(int)args->width,(int)args->height,startRow,numRows,args->maxIterations,args->output);}

2.2、交错划分（cyclic / interleaved）

行号按线程号取模

规则：

行 j 交给线程 (j % T)

示意图

行号: 0 1 2 3 4 5 6 7 8 9 10 11 ─────────────────────────────── 线程0: █ █ █ █ 线程1: █ █ █ █ 线程2: █ █ █ █

等价于：

thread 0 → 行0, 3, 6, 9
thread 1 → 行1, 4, 7, 10
thread 2 → 行2, 5, 8, 11

这就是你看到的那句话：

线程 t 算 t, t+T, t+2T… 行

代码

voidworkerThreadStartCyclic(WorkerArgs*constargs){inttid=args->threadId;intT=args->numThreads;intH=(int)args->height;// Interleaved partitioning: Thread tid is responsible for row tid, tid+T, tid+2T, ...for(intj=tid;j<H;j+=T){mandelbrotSerial(args->x0,args->y0,args->x1,args->y1,(int)args->width,(int)args->height,j,// startRow1,// numRowsargs->maxIterations,args->output);}}

2.3、性能对比

运行代码结果如下：

[mandelbrot serial]:[692.776]ms Wrote imagefilemandelbrot-serial.ppm[mandelbrot thread block]:[92.090]ms Wrote imagefilemandelbrot-thread.ppm[mandelbrot thread cyclic]:[49.905]ms(7.52x speedup: serial vs thread block,16threads)(13.88x speedup: serial vs thread cyclic,16threads)

实际上交错划分性能更好：

简单来说，Mandelbrot 图像中，不同位置的点发散速度不同，导致不同图像行的计算量不均匀。
在block（连续块）划分中，复杂区域的多行可能集中分配给某一个线程，使该线程成为瓶颈，其他线程空闲等待，从而降低整体性能。
而cyclic（交错）划分将不同行均匀分配给各个线程，把计算量较大的区域打散到多个线程中，显著改善负载均衡，因此整体执行时间更短、加速比更高。