Spring AI + Xinference + Milvus实战：5步搭建本地问答系统（附避坑指南）-编程实验室

Spring AI + Xinference + Milvus实战：5步搭建高隐私本地问答系统

在数据隐私日益重要的今天，企业越来越需要能够完全掌控数据的AI解决方案。本文将展示如何利用Spring AI框架，结合Xinference开源模型和Milvus向量数据库，构建一个完全本地化的智能问答系统。不同于依赖OpenAI等商业API的方案，这个系统所有组件都运行在您的私有环境中，确保数据不出本地。

1. 环境准备与依赖配置

搭建本地问答系统的第一步是准备开发环境。我们需要配置Java开发环境、安装必要的服务，并设置项目依赖。

基础环境要求：

JDK 17或更高版本
Maven 3.6+
Docker（用于运行Milvus）
Python 3.8+（用于Xinference）

首先创建Spring Boot项目并添加关键依赖。在pom.xml中配置以下核心组件：

<dependencies> <!-- Spring AI核心依赖 --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-alibaba-starter</artifactId> <version>${spring-ai-alibaba.version}</version> </dependency> <!-- Web支持 --> <dependency> <groupId>org.springframework.boot</groupId> <artifactId>spring-boot-starter-web</artifactId> </dependency> <!-- Milvus向量存储 --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-milvus-store-spring-boot-starter</artifactId> </dependency> <!-- Xinference适配器 --> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai-spring-boot-starter</artifactId> <exclusions> <exclusion> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai</artifactId> </exclusion> </exclusions> </dependency> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-openai</artifactId> <version>1.0.0-M6-XIN</version> </dependency> </dependencies>

提示：Xinference提供了与OpenAI兼容的API接口，这使得我们可以无缝替换商业API，同时保持代码结构不变。

2. 服务部署与配置

2.1 启动Milvus向量数据库

使用Docker快速启动Milvus服务：

docker run -d --name milvus \ -p 19530:19530 \ -p 9091:9091 \ milvusdb/milvus:v2.4.0-rc.1

验证服务是否正常运行：

docker logs milvus | grep "Successfully initialized"

2.2 部署Xinference模型服务

安装Xinference并启动本地模型服务：

pip install "xinference[all]" xinference launch -p 9997

部署一个适合问答场景的开源模型，例如Qwen2：

xinference launch --model-name "qwen2-instruct" \ --model-format pytorch \ --size-in-billions 7 \ --endpoint "http://127.0.0.1:9997"

记下返回的model_uid，后续配置会用到。

2.3 应用配置

在application.yml中配置各组件连接：

spring: application: name: local-ai-qa-system ai: openai: api-key: "dummy" # Xinference需要非空值 base-url: http://localhost:9997 chat: options: model: qwen2-instruct # 与部署的模型名称一致 embedding: options: model: bge-m3 dimensions: 1024 vectorstore: milvus: client: host: localhost port: 19530 username: root password: milvus databaseName: default collectionName: qa_store initializeSchema: true embeddingDimension: 1024 indexType: IVF_FLAT metricType: COSINE

3. 核心功能实现

3.1 向量存储初始化

创建Milvus向量存储配置类，设置合适的索引参数：

@Configuration public class VectorStoreConfig { @Bean public MilvusVectorStore vectorStore( MilvusServiceClient milvusClient, EmbeddingModel embeddingModel) { return MilvusVectorStore.builder(milusClient, embeddingModel) .collectionName("qa_store") .databaseName("default") .metricType(MetricType.COSINE) .indexType(IndexType.IVF_FLAT) .indexParameters("{\"nlist\":1024}") .embeddingDimension(1024) .initializeSchema(true) .build(); } }

3.2 知识库数据导入

实现ApplicationRunner在启动时加载初始文档：

@Component public class DataInitializer implements ApplicationRunner { private final MilvusVectorStore vectorStore; public DataInitializer(MilvusVectorStore vectorStore) { this.vectorStore = vectorStore; } @Override public void run(ApplicationArguments args) { List<Document> docs = List.of( new Document("Spring AI支持与多种大模型集成"), new Document("Xinference提供了本地模型部署方案"), new Document("Milvus的IVF_FLAT索引适合高精度搜索"), // 添加更多领域知识文档... ); vectorStore.add(docs); } }

3.3 问答服务实现

创建REST控制器处理用户查询：

@RestController @RequestMapping("/api/qa") public class QAController { private final ChatClient chatClient; private final VectorStore vectorStore; private final List<Message> chatHistory = new ArrayList<>(); public QAController(ChatClient chatClient, VectorStore vectorStore) { this.chatClient = chatClient; this.vectorStore = vectorStore; } @GetMapping public Flux<String> answerQuestion(@RequestParam String question) { // 1. 向量搜索相关文档 List<Document> relevantDocs = vectorStore.similaritySearch( SearchRequest.query(question).withTopK(3)); // 2. 构建增强提示 String context = relevantDocs.stream() .map(Document::getContent) .collect(Collectors.joining("\n")); String enhancedPrompt = String.format(""" 基于以下上下文回答问题： %s 问题：%s 回答：""", context, question); // 3. 调用模型生成回答 return chatClient.prompt() .user(enhancedPrompt) .stream() .content(); } }

4. 高级功能扩展

4.1 对话历史管理

增强问答服务的连续性，添加对话记忆功能：

@Bean public ChatMemory chatMemory() { return new InMemoryChatMemory(20); // 保留最近20轮对话 } @GetMapping("/chat") public Flux<String> chat(@RequestParam String message, @RequestParam String sessionId) { // 获取历史对话 List<Message> history = chatMemory.get(sessionId); // 构建包含上下文的提示 String historyContext = history.stream() .map(m -> m.getRole() + ": " + m.getContent()) .collect(Collectors.joining("\n")); String fullPrompt = "对话历史:\n" + historyContext + "\n用户最新问题: " + message; // 调用模型并更新历史 return chatClient.prompt() .user(fullPrompt) .stream() .content() .doOnNext(response -> { chatMemory.add(sessionId, new UserMessage(message)); chatMemory.add(sessionId, new AssistantMessage(response)); }); }

4.2 混合检索策略

结合关键词和向量搜索提升召回率：

public List<Document> hybridSearch(String query) { // 向量搜索 List<Document> vectorResults = vectorStore.similaritySearch( SearchRequest.query(query).withTopK(5)); // 关键词搜索（需实现） List<Document> keywordResults = keywordSearch(query); // 结果融合与去重 return mergeResults(vectorResults, keywordResults); }

4.3 性能优化配置

调整Milvus索引参数提升查询效率：

spring: ai: vectorstore: milvus: indexParameters: '{"nlist":2048,"nprobe":64}' searchParamsJson: '{"nprobe":128}'

5. 避坑指南与最佳实践

在实际部署过程中，可能会遇到以下典型问题：

常见问题1：Xinference模型加载失败

症状：API返回500错误，日志显示"Model not found"解决方案：

确认模型UID与配置一致

检查模型是否完成下载：

xinference list --endpoint http://localhost:9997

确保显存足够（7B模型约需14GB）

常见问题2：Milvus查询超时

症状：搜索请求长时间无响应优化方案：

减少topK参数值
调整nprobe参数（平衡精度与速度）
考虑使用HNSW索引替代IVF_FLAT

常见问题3：Spring AI版本冲突

症状：启动时报NoSuchMethodError解决方法：

统一使用Spring AI BOM管理版本：

<dependencyManagement> <dependencies> <dependency> <groupId>org.springframework.ai</groupId> <artifactId>spring-ai-bom</artifactId> <version>1.0.0</version> <type>pom</type> <scope>import</scope> </dependency> </dependencies> </dependencyManagement>

排除冲突的依赖项

性能优化建议：

优化方向	具体措施	预期效果
向量索引	使用HNSW替代IVF_FLAT	提升搜索速度20-50%
批处理	设置`batchingStrategy`	减少API调用次数
模型量化	部署4bit量化模型	降低显存占用50%
缓存策略	实现查询结果缓存	减少重复计算

安全加固措施：

为Milvus启用认证：

spring: ai: vectorstore: milvus: client: username: admin password: strongpassword

限制Xinference访问：

xinference launch --port 9997 --auth-token mysecrettoken

启用Spring Security保护API：

@Configuration @EnableWebSecurity public class SecurityConfig { @Bean SecurityFilterChain securityFilterChain(HttpSecurity http) throws Exception { http .authorizeHttpRequests(auth -> auth .requestMatchers("/api/**").authenticated() ) .httpBasic(); return http.build(); } }

经过以上步骤，您已经构建了一个完整的企业级本地问答系统。这个方案不仅解决了数据隐私问题，还提供了与商业API相当的功能体验。根据实际需求，您可以进一步扩展以下功能：