规避 RAG 检索增强生成漏洞：防范提示词注入与安全越狱-编程实验室

规避 RAG 检索增强生成漏洞：防范提示词注入与安全越狱

RAG 系统的安全漏洞，本文差点让搜索引擎变成了黑客工具

前言

做 RAG 系统时，本文把搜索引擎集成进去，让大模型能实时搜索。结果测试发现，只要构造特定查询，就能让大模型执行危险操作。

RAG 的检索环节是一个巨大的攻击面。该过滤的不只是输入，还有检索结果。今天聊聊 RAG 的安全问题。

一、底层原理

1.1 RAG 系统的攻击面

RAG 系统有三个环节可以注入攻击：

graph TD A["攻击入口"] --> B["用户输入注入"] A --> C["检索结果投毒"] A --> D["知识库污染"] B --> E["拼接恶意 Prompt"] C --> F["返回恶意内容"] D --> G["长期记忆中毒"] E --> H["模型被操控"] F --> H G --> H H --> I["越权/泄露/破坏"]

主要攻击方式：

用户输入注入恶意指令
检索到的文档包含攻击内容
知识库被投毒，长期影响

1.2 安全防护对比

防护层	作用	效果
输入过滤	防止注入	基础
检索结果过滤	防止恶意文档	重要
输出审核	防止泄露	兜底
知识库审计	防止投毒	长期

二、快速上手

2.1 不安全的 RAG

class InsecureRAG: def query(self, user_input: str): # 直接检索，没有过滤 docs = self.retrieve(user_input) # 直接拼接到 prompt prompt = f"基于以下内容回答：{docs}\n问题：{user_input}" return self.llm(prompt)

攻击者可以通过知识库投毒或输入注入绕过。

2.2 安全加固版

class SecureRAG: def __init__(self, retriever, llm): self.retriever = retriever self.llm = llm self.dangerous_patterns = [ "忽略指令", "系统命令", "删除", "drop table", "exec(" ] def query(self, user_input: str) -> str: # 1. 输入过滤 if self._is_dangerous(user_input): return "输入被拦截" # 2. 检索 docs = self.retriever.retrieve(user_input, k=5) # 3. 检索结果过滤 safe_docs = [d for d in docs if not self._is_dangerous(d)] if not safe_docs: return "未找到安全的相关文档" # 4. 安全生成 prompt = self._build_safe_prompt(user_input, safe_docs) return self.llm(prompt) def _is_dangerous(self, text: str) -> bool: return any(p in text for p in self.dangerous_patterns) def _build_safe_prompt(self, query: str, docs: list) -> str: context = "\n".join(docs[:3]) return f"""系统指令：你是一个安全助手。 用户问题：{query} 参考资料：{context} 请基于参考资料回答。如果问题涉及危险操作，请拒绝。"""

三、核心 API / 深水区

3.1 RAG 安全防护措施速查

措施	实现	效果
输入过滤	关键词 + 正则	基础
检索结果过滤	内容安全检测	重要
Prompt 隔离	系统指令和用户指令分开	好
输出审核	关键词 + 敏感信息	兜底

3.2 检索结果安全过滤器

class ResultFilter: def __init__(self): self.blocked = [ "恶意代码", "攻击方法", "密码", "密钥" ] def filter(self, docs: list) -> list: return [d for d in docs if not self._is_blocked(d)] def _is_blocked(self, text: str) -> bool: return any(b in text.lower() for b in self.blocked)

3.3 输出审核

class OutputAuditor: def __init__(self): self.sensitive_patterns = [ r"\b\d{17}[\dXx]\b", # 身份证 r"1[3-9]\d{9}", # 手机号 r"[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}" # 邮箱 ] def audit(self, text: str) -> str: import re for pat in self.sensitive_patterns: text = re.sub(pat, "***", text) return text

四、实战演练

完整的 RAG 安全系统：

from typing import List, Dict, Optional import re class RAGSecuritySystem: def __init__(self): self.input_filter = InputFilter() self.doc_filter = DocumentFilter() self.output_auditor = OutputAuditor() self.audit_log = [] def process_query(self, query: str, retriever, llm) -> Dict: # 1. 输入检查 input_check = self.input_filter.check(query) if not input_check["safe"]: self._log("input_blocked", query, input_check) return {"status": "blocked", "reason": "输入不安全"} # 2. 检索 docs = retriever.retrieve(query, k=10) # 3. 文档过滤 safe_docs = self.doc_filter.filter(docs) if len(safe_docs) < len(docs): self._log("filtered_docs", query, { "total": len(docs), "filtered": len(docs) - len(safe_docs) }) # 4. 安全生成 prompt = self._build_prompt(query, safe_docs) response = llm(prompt) # 5. 输出审核 safe_response = self.output_auditor.audit(response) return { "status": "ok", "response": safe_response, "sources": [d[:50] for d in safe_docs[:2]] } def _build_prompt(self, query: str, docs: List[str]) -> str: context = "\n".join(docs[:3]) return f"""你是安全的问答助手。 如果问题涉及危险内容，请回复"无法回答该问题"。 参考资料：{context} 问题：{query}""" def _log(self, event: str, query: str, detail: dict): self.audit_log.append({ "event": event, "query": query[:50], "detail": detail }) class InputFilter: def __init__(self): self.dangerous = [ "ignore", "override", "system", "exec", "shell", "bash" ] def check(self, text: str) -> Dict: for d in self.dangerous: if d in text.lower(): return {"safe": False, "reason": f"含有关键词: {d}"} return {"safe": True} class DocumentFilter: def __init__(self): self.suspicious = [ "恶意", "病毒", "攻击代码", "hack", "exploit" ] def filter(self, docs: List[str]) -> List[str]: return [d for d in docs if not self._is_suspicious(d)] def _is_suspicious(self, doc: str) -> bool: return any(s in doc.lower() for s in self.suspicious) class OutputAuditor: def audit(self, text: str) -> str: patterns = [ (r"\b\d{17}[\dXx]\b", "***"), (r"1[3-9]\d{9}", "***"), ] for pat, mask in patterns: text = re.sub(pat, mask, text) return text security = RAGSecuritySystem() result = security.process_query("搜索天气", retriever, llm) print(result)

五、避坑指南与最佳实践

💡 **技巧：检索结果也要过滤
不只是输入要过滤，检索到的文档也可能有毒。

⚠️ **警告：不要信任检索结果
向量相似度不等于内容安全。

✅ **推荐：输出审核兜底
敏感信息脱敏，防止泄露。

六、综合实战演示

企业级 RAG 安全网关：

from typing import Dict, List import json class RAGSecurityGateway: def __init__(self): self.rules = self._load_rules() self.blocked_count = 0 def _load_rules(self): return { "input_rules": [ {"type": "keyword", "patterns": ["ignore", "override"]}, {"type": "length", "max": 2000}, ], "document_rules": [ {"type": "keyword", "patterns": ["恶意", "hack"]}, {"type": "size", "max": 10000}, ] } def check_request(self, query: str) -> Dict: if len(query) > self.rules["input_rules"][1]["max"]: return {"allowed": False, "reason": "输入过长"} return {"allowed": True} def check_documents(self, docs: List[str]) -> List[str]: safe = [] for doc in docs: if len(doc) <= 10000: safe.append(doc) return safe def sanitize_response(self, response: str) -> str: return response[:5000] gateway = RAGSecurityGateway() check = gateway.check_request("搜索今天天气") print(check)