Python性能优化技巧：从入门到精通-编程实验室

Python性能优化技巧：从入门到精通

引言

Python以其简洁优雅的语法闻名，但性能问题一直是开发者关注的焦点。作为一名从Python转向Rust的后端开发者，我在实践中总结了大量的性能优化技巧。本文将深入探讨Python性能优化的各个方面，帮助你编写更快、更高效的代码。

一、性能分析基础

1.1 性能分析工具

import cProfile def slow_function(): total = 0 for i in range(1_000_000): total += i return total cProfile.run('slow_function()')

1.2 使用timeit

import timeit def test_function(): return sum(range(100)) time = timeit.timeit(test_function, number=10000) print(f"Time: {time:.4f} seconds")

1.3 line_profiler

from line_profiler import LineProfiler @profile def process_data(data): result = [] for item in data: if item % 2 == 0: result.append(item * 2) return result process_data(range(1_000_000))

二、算法优化

2.1 选择合适的数据结构

# 不好的做法：列表查找 names = ['Alice', 'Bob', 'Charlie'] if 'Bob' in names: # O(n) print("Found") # 好的做法：集合查找 names_set = {'Alice', 'Bob', 'Charlie'} if 'Bob' in names_set: # O(1) print("Found")

2.2 避免不必要的计算

# 不好的做法：重复计算 def calculate(): result = 0 for i in range(1000): result += expensive_calculation(i) * 2 result += expensive_calculation(i) * 3 return result # 好的做法：缓存中间结果 def calculate_optimized(): result = 0 for i in range(1000): value = expensive_calculation(i) result += value * 2 result += value * 3 return result

2.3 使用内置函数

# 不好的做法：手动循环 numbers = [1, 2, 3, 4, 5] total = 0 for num in numbers: total += num # 好的做法：使用内置sum函数 total = sum(numbers)

三、循环优化

3.1 列表推导式

# 不好的做法：显式循环 result = [] for i in range(100): if i % 2 == 0: result.append(i * 2) # 好的做法：列表推导式 result = [i * 2 for i in range(100) if i % 2 == 0]

3.2 使用生成器

# 不好的做法：创建完整列表 def generate_numbers(): return [i for i in range(1_000_000)] # 好的做法：使用生成器 def generate_numbers_gen(): for i in range(1_000_000): yield i

3.3 向量化操作

import numpy as np # 不好的做法：Python循环 arr = list(range(1_000_000)) result = [x * 2 for x in arr] # 好的做法：NumPy向量化 arr = np.arange(1_000_000) result = arr * 2

四、内存优化

4.1 使用slots减少内存占用

# 不好的做法：普通类 class Person: def __init__(self, name, age): self.name = name self.age = age # 好的做法：使用__slots__ class PersonOptimized: __slots__ = ['name', 'age'] def __init__(self, name, age): self.name = name self.age = age

4.2 避免不必要的对象创建

# 不好的做法：循环中创建对象 def process_items(items): result = [] for item in items: temp_obj = SomeObject(item) # 每次循环创建新对象 result.append(temp_obj.process()) return result # 好的做法：复用对象 def process_items_optimized(items): result = [] temp_obj = SomeObject(None) for item in items: temp_obj.reset(item) result.append(temp_obj.process()) return result

4.3 使用内存视图

# 不好的做法：复制数据 data = bytearray(1_000_000) subset = data[100:200] # 创建副本 # 好的做法：使用memoryview data = bytearray(1_000_000) mv = memoryview(data) subset = mv[100:200] # 不复制，共享内存

五、字符串优化

5.1 避免字符串拼接

# 不好的做法：字符串拼接 result = "" for i in range(1000): result += str(i) # 每次创建新字符串 # 好的做法：使用列表 parts = [] for i in range(1000): parts.append(str(i)) result = "".join(parts)

5.2 使用f-string

# 不好的做法：%格式化或format name = "Alice" age = 30 message = "Name: %s, Age: %d" % (name, age) message = "Name: {}, Age: {}".format(name, age) # 好的做法：f-string message = f"Name: {name}, Age: {age}"

5.3 使用string模块

import string # 不好的做法：手动判断 def is_whitespace(c): return c == ' ' or c == '\t' or c == '\n' # 好的做法：使用string模块 def is_whitespace(c): return c in string.whitespace

六、并发与并行优化

6.1 使用multiprocessing

from multiprocessing import Pool def process_chunk(chunk): return sum(chunk) data = list(range(10_000_000)) chunks = [data[i:i+1_000_000] for i in range(0, 10_000_000, 1_000_000)] with Pool(processes=4) as pool: results = pool.map(process_chunk, chunks) total = sum(results)

6.2 使用asyncio

import asyncio async def fetch_data(url): async with aiohttp.ClientSession() as session: async with session.get(url) as response: return await response.json() async def main(): urls = ["https://api.example.com/data1", "https://api.example.com/data2"] tasks = [fetch_data(url) for url in urls] results = await asyncio.gather(*tasks) return results asyncio.run(main())

6.3 使用concurrent.futures

from concurrent.futures import ThreadPoolExecutor def download_file(url): import requests return requests.get(url).content urls = ["https://example.com/file1.txt", "https://example.com/file2.txt"] with ThreadPoolExecutor(max_workers=4) as executor: results = executor.map(download_file, urls)

七、使用C扩展

7.1 使用Cython

# example.pyx def fibonacci(int n): cdef int a = 0 cdef int b = 1 cdef int i for i in range(n): a, b = b, a + b return a

7.2 使用ctypes调用C库

import ctypes lib = ctypes.CDLL("./mylib.so") lib.fibonacci.argtypes = [ctypes.c_int] lib.fibonacci.restype = ctypes.c_int result = lib.fibonacci(40) print(result)

7.3 使用Numba

from numba import jit @jit(nopython=True) def fibonacci(n): a, b = 0, 1 for _ in range(n): a, b = b, a + b return a result = fibonacci(40)

八、实战案例：性能优化前后对比

8.1 优化前

def process_logs(log_files): total_errors = 0 for file in log_files: with open(file, 'r') as f: for line in f: if 'ERROR' in line: total_errors += 1 return total_errors

8.2 优化后

from concurrent.futures import ThreadPoolExecutor import re def count_errors_in_file(file): pattern = re.compile(r'ERROR') count = 0 with open(file, 'r') as f: for line in f: if pattern.search(line): count += 1 return count def process_logs_optimized(log_files): with ThreadPoolExecutor(max_workers=4) as executor: results = executor.map(count_errors_in_file, log_files) return sum(results)

九、性能优化最佳实践

9.1 先测量再优化

import cProfile import pstats def main(): # 你的代码 cProfile.run('main()', 'profile_results') stats = pstats.Stats('profile_results') stats.sort_stats(pstats.SortKey.TIME).print_stats(10)

9.2 选择合适的优化策略

场景	推荐策略
CPU密集型	使用PyPy、Cython、Numba或多进程
IO密集型	使用异步IO或多线程
内存问题	使用slots、生成器、内存视图
字符串处理	使用join、f-string

9.3 避免过早优化

# 不要在开发初期就过度优化 # 先保证代码正确，再进行性能优化 # 优化前：清晰的代码 def calculate_average(numbers): return sum(numbers) / len(numbers) # 优化后：更快但更复杂 def calculate_average_fast(numbers): total = 0 count = 0 for num in numbers: total += num count += 1 return total / count

十、Python vs Rust性能对比

10.1 斐波那契数列

# Python def fibonacci(n): a, b = 0, 1 for _ in range(n): a, b = b, a + b return a

// Rust fn fibonacci(n: u32) -> u64 { let mut a = 0; let mut b = 1; for _ in 0..n { let temp = a; a = b; b = temp + b; } a }

10.2 性能对比

操作	Python	Rust	倍数
Fib(40)	~0.4s	~0.0001s	4000x
列表排序	~0.1s	~0.005s	20x
文件读写	~0.5s	~0.05s	10x

总结

Python性能优化是一个系统性的工程。通过本文的学习，你应该掌握了以下核心要点：

性能分析：使用cProfile、timeit、line_profiler
算法优化：选择合适的数据结构、避免重复计算
循环优化：列表推导式、生成器、向量化
内存优化：slots、避免对象创建、memoryview
字符串优化：join、f-string、string模块
并发优化：multiprocessing、asyncio、concurrent.futures
C扩展：Cython、ctypes、Numba
最佳实践：先测量再优化、选择合适策略

作为从Python转向Rust的后端开发者，理解Python的性能瓶颈有助于更好地理解Rust的性能优势。在需要极致性能的场景，Rust是更好的选择；而在开发效率优先的场景，Python仍然是首选。