Qwen3-4B Instruct-2507实战教程：用LangChain封装Qwen3构建结构化问答Agent-编程实验室

Qwen3-4B Instruct-2507实战教程：用LangChain封装Qwen3构建结构化问答Agent

1. 项目概述

Qwen3-4B Instruct-2507是阿里通义千问系列中的一款专注于纯文本处理的大语言模型。相比全功能版本，它移除了视觉相关模块，专注于提升文本处理效率和响应速度。本教程将指导你如何使用LangChain框架封装这个模型，构建一个结构化的问答Agent系统。

这个项目的核心优势在于：

纯文本优化：专注文本任务，推理速度更快
流式输出：支持实时逐字显示生成内容
高效部署：自动适配GPU资源，开箱即用
灵活调节：可调整生成长度和创造性参数

2. 环境准备与安装

2.1 基础环境要求

在开始前，请确保你的系统满足以下要求：

Python 3.8或更高版本
CUDA 11.7+（如需GPU加速）
至少16GB内存（32GB推荐）
支持NVIDIA显卡（如使用GPU）

2.2 安装依赖包

使用以下命令安装必要的Python包：

pip install langchain transformers torch streamlit

对于GPU加速，建议安装对应版本的PyTorch：

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu117

3. 模型加载与基础封装

3.1 加载Qwen3-4B模型

首先，我们创建一个Python脚本加载基础模型：

from transformers import AutoModelForCausalLM, AutoTokenizer model_name = "Qwen/Qwen3-4B-Instruct-2507" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=True) model = AutoModelForCausalLM.from_pretrained( model_name, device_map="auto", torch_dtype="auto", trust_remote_code=True )

3.2 创建基础问答函数

封装一个简单的问答函数：

def qwen_qa(question, chat_history=None, max_length=512, temperature=0.7): if chat_history is None: chat_history = [] inputs = tokenizer.apply_chat_template( chat_history + [{"role": "user", "content": question}], add_generation_prompt=True, return_tensors="pt" ).to(model.device) outputs = model.generate( inputs, max_new_tokens=max_length, temperature=temperature, do_sample=temperature > 0, pad_token_id=tokenizer.eos_token_id ) response = tokenizer.decode(outputs[0][len(inputs[0]):], skip_special_tokens=True) return response

4. 使用LangChain构建结构化Agent

4.1 创建LangChain接口

我们将使用LangChain的LLM接口封装Qwen3：

from langchain.llms.base import LLM from typing import Optional, List, Dict, Any class Qwen3LangChain(LLM): @property def _llm_type(self) -> str: return "qwen3-4b" def _call(self, prompt: str, stop: Optional[List[str]] = None, **kwargs) -> str: return qwen_qa(prompt, **kwargs) @property def _identifying_params(self) -> Dict[str, Any]: return {"model_name": "Qwen3-4B-Instruct-2507"}

4.2 构建问答链

创建一个简单的问答链：

from langchain.chains import LLMChain from langchain.prompts import PromptTemplate template = """你是一个专业的AI助手。请回答以下问题： 问题: {question} 回答:""" prompt = PromptTemplate(template=template, input_variables=["question"]) llm = Qwen3LangChain() qa_chain = LLMChain(llm=llm, prompt=prompt)

5. 高级功能实现

5.1 多轮对话记忆

实现对话历史管理：

from langchain.memory import ConversationBufferMemory memory = ConversationBufferMemory() conversation = LLMChain( llm=llm, prompt=prompt, memory=memory, verbose=True ) # 使用示例 response = conversation({"question": "介绍一下Python的特点"}) print(response["text"]) response = conversation({"question": "能详细说说其中的动态类型吗"}) print(response["text"]) # 会记住之前的对话

5.2 结构化输出解析

使用LangChain的输出解析器：

from langchain.output_parsers import StructuredOutputParser, ResponseSchema from langchain.prompts import ChatPromptTemplate response_schemas = [ ResponseSchema(name="answer", description="问题的直接回答"), ResponseSchema(name="explanation", description="详细的解释"), ResponseSchema(name="sources", description="参考来源", type="list") ] output_parser = StructuredOutputParser.from_response_schemas(response_schemas) format_instructions = output_parser.get_format_instructions() prompt = ChatPromptTemplate.from_template( """回答以下问题，并按照指定格式返回结果。 问题: {question} {format_instructions}""" ) chain = LLMChain(llm=llm, prompt=prompt) output = chain.run(question="Python中的装饰器是什么？", format_instructions=format_instructions) parsed = output_parser.parse(output)

6. 部署为Web服务

6.1 使用Streamlit创建界面

创建一个简单的Web界面：

import streamlit as st st.title("Qwen3-4B问答系统") st.sidebar.header("参数设置") max_length = st.sidebar.slider("最大长度", 128, 2048, 512) temperature = st.sidebar.slider("创造性", 0.0, 1.5, 0.7) if "messages" not in st.session_state: st.session_state.messages = [] for message in st.session_state.messages: with st.chat_message(message["role"]): st.markdown(message["content"]) if prompt := st.chat_input("输入你的问题"): st.session_state.messages.append({"role": "user", "content": prompt}) with st.chat_message("user"): st.markdown(prompt) with st.chat_message("assistant"): response = qwen_qa( prompt, chat_history=st.session_state.messages[:-1], max_length=max_length, temperature=temperature ) st.markdown(response) st.session_state.messages.append({"role": "assistant", "content": response})