京东商品评论作为用户购买决策的核心参考,其接口(核心接口:jingdong.comment.read.getCommentList及Web端非开放接口)采用「宙斯开放平台签名+Web端动态Token签名」的双防护体系,同时叠加「用户等级校验+评论分页限流+IP行为画像」的多层风控。本文突破传统单一接口爬取思路,实现开放平台与Web端接口的双重适配,同时创新性地加入评论情感分析、卖点提取等价值挖掘模块,形成从接口突破到数据应用的全链路方案。
一、接口核心机制与风控体系拆解
京东商品评论数据分为「开放平台可获取的基础评论」和「Web端专属的深度评论」(含追评、晒单、有用数等),两者采用不同的签名与风控逻辑,核心特征如下:
1. 双接口链路与核心参数对比
京东评论数据需通过「评论元信息接口→分页评论接口→评论详情接口」的链式调用获取,开放平台与Web端接口核心参数差异显著:
参数类型 | 开放平台接口(宙斯体系) | Web端接口(非开放) | 风控特征 |
|---|---|---|---|
核心标识 | skuId(商品SKU)、pageNum/pageSize | productId(商品ID)、page/ps、score(评分筛选) | SKU与productId需关联验证, mismatch直接返回空 |
签名参数 | app_key、sign(HMAC-SHA256)、timestamp、nonce | token(动态生成)、uuid、client(终端标识) | Web端token 10分钟失效,uuid与设备绑定 |
权限参数 | access_token(用户授权) | 3rdcookie(登录态)、user-key(用户标识) | 未登录态仅返回前10页评论,无追评/晒单 |
扩展参数 | commentType(0=全部,1=好评) | isShadowSku(是否子SKU)、sortType(排序) | sortType=5(追评)需登录态+高权限 |
2. 关键突破点
双签名体系适配:开放平台需严格遵循宙斯HMAC-SHA256签名,Web端需逆向token生成逻辑(基于user-key+timestamp+动态盐值),传统单一签名方案无法兼容;
评论数据分层获取:开放平台接口易获取但字段有限,Web端接口可获取追评、晒单、评论有用数等深度数据,但风控更严格;
分页限流突破:Web端未登录态仅允许获取10页评论,登录态最多获取100页,需通过多账号轮换+IP池规避限流;
评论情感与卖点提取:京东评论含大量商品使用反馈,需通过NLP技术结构化提取核心卖点与负面问题,实现数据价值升级;
多维度筛选解锁:Web端支持按评分、是否晒单、是否追评等筛选,需逆向筛选参数加密逻辑,直接传参易失效。
二、创新技术方案实现
本方案实现「开放平台+Web端」双接口融合采集,同时加入评论价值挖掘模块,核心分为3大组件:双签名生成器、双源评论采集器、评论价值重构器。
1. 双签名生成器(核心突破)
同时适配京东宙斯开放平台签名与Web端动态token签名,解决双接口签名验证问题:
import hashlib import hmac import time import random import urllib.parse import requests from typing import Dict, Optional class JdDoubleSignGenerator: def __init__(self, zeus_app_key: Optional[str] = None, zeus_app_secret: Optional[str] = None): # 宙斯开放平台参数(可选,用于开放接口) self.zeus_app_key = zeus_app_key self.zeus_app_secret = zeus_app_secret # Web端参数(用于非开放接口) self.web_salt = self._get_web_salt() # 动态盐值,逆向Web端js获取 self.client = "pc" # 终端标识:pc/mobile/wx def _get_web_salt(self) -> str: """获取Web端动态盐值(逆向京东评论js,每小时更新)""" # 真实场景需从京东评论页面的js中提取,此处模拟逆向结果 hour = time.strftime("%Y%m%d%H") return hashlib.md5(f"jd_comment_salt_{hour}".encode()).hexdigest()[:12] def generate_zeus_sign(self, params: Dict) -> tuple: """生成京东宙斯开放平台签名(HMAC-SHA256)""" if not self.zeus_app_key or not self.zeus_app_secret: raise ValueError("需配置宙斯app_key与app_secret") # 新增宙斯固定参数 timestamp = str(int(time.time() * 1000)) # 毫秒级 nonce = ''.join(random.choices("abcdefghijklmnopqrstuvwxyz0123456789", k=12)) params.update({ "app_key": self.zeus_app_key, "sign_method": "hmac-sha256", "format": "json", "v": "2.0", "timestamp": timestamp, "nonce": nonce }) # 字典序排序+URL编码 sorted_params = sorted(params.items(), key=lambda x: x[0]) param_str = urllib.parse.urlencode(sorted_params) # HMAC-SHA256加密 sign = hmac.new( self.zeus_app_secret.encode(), param_str.encode(), digestmod=hashlib.sha256 ).hexdigest().upper() return sign, timestamp, nonce def generate_web_token(self, user_key: str) -> str: """生成Web端评论接口token(逆向核心)""" timestamp = str(int(time.time())) # 加密原文:user_key + timestamp + web_salt raw_str = f"{user_key}{timestamp}{self.web_salt}" token = hashlib.md5(raw_str.encode()).hexdigest() return token, timestamp def extract_user_key(self, cookie: str) -> str: """从登录态cookie中提取user-key(Web端核心标识)""" import re match = re.search(r'user-key=([^;]+)', cookie) return match.group(1) if match else "" def generate_uuid(self) -> str: """生成Web端设备uuid(模拟真实设备)""" return ''.join(random.choices("0123456789abcdef", k=32))
2. 双源评论采集器
融合开放平台与Web端接口,实现基础评论+深度评论(追评、晒单)的全量采集,自动适配登录态与风控:
import requests from fake_useragent import UserAgent import json import time from typing import List, Optional class JdCommentDualScraper: def __init__(self, zeus_app_key: Optional[str] = None, zeus_app_secret: Optional[str] = None, cookie: Optional[str] = None, proxy: Optional[str] = None): self.sign_generator = JdDoubleSignGenerator(zeus_app_key, zeus_app_secret) self.cookie = cookie # 登录态cookie(Web端必需) self.proxy = proxy self.session = self._init_session() # 接口地址配置 self.zeus_api_url = "https://api.jd.com/routerjson" # 开放平台网关 self.web_comment_url = "https://club.jd.com/comment/productPageComments.action" # Web端评论接口 def _init_session(self) -> requests.Session: """初始化请求会话(模拟真实用户行为)""" session = requests.Session() # 基础请求头 session.headers.update({ "User-Agent": UserAgent().random, "Accept": "application/json, text/plain, */*", "Accept-Language": "zh-CN,zh;q=0.9", "Content-Type": "application/x-www-form-urlencoded;charset=UTF-8" }) # 登录态cookie配置 if self.cookie: session.headers["Cookie"] = self.cookie # 代理配置 if self.proxy: session.proxies = {"http": self.proxy, "https": self.proxy} return session def _fetch_zeus_comment(self, sku_id: str, page_num: int = 1, page_size: int = 20) -> Dict: """通过开放平台接口获取基础评论(低风控)""" if not self.sign_generator.zeus_app_key: return {"error": "未配置开放平台参数,无法调用宙斯接口"} params = { "method": "jingdong.comment.read.getCommentList", "skuId": sku_id, "pageNum": page_num, "pageSize": page_size, "commentType": 0 # 0=全部评论 } # 生成宙斯签名 sign, timestamp, nonce = self.sign_generator.generate_zeus_sign(params) params.update({"sign": sign, "timestamp": timestamp, "nonce": nonce}) # 发送请求 response = self.session.post(self.zeus_api_url, data=params, timeout=15) return self._structurize_zeus_comment(response.json()) def _fetch_web_comment(self, product_id: str, sku_id: str, page: int = 1, ps: int = 20, score: int = 0, sort_type: int = 5) -> Dict: """通过Web端接口获取深度评论(含追评、晒单)""" if not self.cookie: return {"error": "未配置登录态cookie,无法调用Web端深度评论接口"} # 提取user-key并生成token user_key = self.sign_generator.extract_user_key(self.cookie) token, timestamp = self.sign_generator.generate_web_token(user_key) uuid = self.sign_generator.generate_uuid() # 构建参数(Web端核心参数) params = { "productId": product_id, "skuId": sku_id, "page": page, "ps": ps, "score": score, # 0=全部,1=差评,2=中评,3=好评 "sortType": sort_type, # 5=追评,6=晒单 "isShadowSku": 0, "rid": 0, "fold": 1, "token": token, "timestamp": timestamp, "uuid": uuid, "client": self.sign_generator.client } # 发送请求(控制频率,避免风控) time.sleep(random.uniform(2, 3)) response = self.session.get(self.web_comment_url, params=params, timeout=15) return self._structurize_web_comment(response.json(), sort_type) def fetch_full_comment(self, sku_id: str, product_id: str, max_pages: int = 10, include_pursue: bool = True, include_image: bool = True) -> Dict: """ 全量采集评论(基础+追评+晒单) :param sku_id: 商品SKU :param product_id: 商品productId(需与SKU关联) :param max_pages: 最大采集页数 :param include_pursue: 是否采集追评 :param include_image: 是否采集晒单 :return: 全量结构化评论数据 """ full_result = { "sku_id": sku_id, "product_id": product_id, "total_comments": 0, "basic_comments": [], # 基础评论 "pursue_comments": [], # 追评 "image_comments": [], # 晒单 "crawl_time": time.strftime("%Y-%m-%d %H:%M:%S") } # 1. 采集基础评论(开放平台接口,低风控) print("采集基础评论...") for page in range(1, max_pages + 1): zeus_result = self._fetch_zeus_comment(sku_id, page, 20) if "error" in zeus_result: print(f"基础评论采集失败:{zeus_result['error']}") break if not zeus_result["comments"]: break full_result["basic_comments"].extend(zeus_result["comments"]) full_result["total_comments"] += len(zeus_result["comments"]) # 2. 采集追评(Web端接口) if include_pursue: print("采集追评...") for page in range(1, min(max_pages, 5) + 1): # 追评最多采集5页 web_result = self._fetch_web_comment(product_id, sku_id, page, 20, sort_type=5) if "error" in web_result or not web_result["comments"]: print(f"追评采集失败/无更多:{web_result.get('error', '无数据')}") break full_result["pursue_comments"].extend(web_result["comments"]) full_result["total_comments"] += len(web_result["comments"]) # 3. 采集晒单(Web端接口) if include_image: print("采集晒单...") for page in range(1, min(max_pages, 5) + 1): # 晒单最多采集5页 web_result = self._fetch_web_comment(product_id, sku_id, page, 20, sort_type=6) if "error" in web_result or not web_result["comments"]: print(f"晒单采集失败/无更多:{web_result.get('error', '无数据')}") break full_result["image_comments"].extend(web_result["comments"]) full_result["total_comments"] += len(web_result["comments"]) return full_result def _structurize_zeus_comment(self, raw_data: Dict) -> Dict: """结构化开放平台评论数据""" result = {"comments": [], "error": ""} if "error_response" in raw_data: result["error"] = raw_data["error_response"]["msg"] return result comment_list = raw_data.get("result", {}).get("commentInfoList", []) for comment in comment_list: result["comments"].append({ "comment_id": comment.get("id", ""), "user_nickname": comment.get("userNickname", ""), "score": comment.get("score", 5), "content": comment.get("content", ""), "create_time": comment.get("createTime", ""), "product_attr": comment.get("productAttr", ""), # 购买规格 "is_pursue": False, "is_image": False, "image_urls": [] }) return result def _structurize_web_comment(self, raw_data: Dict, sort_type: int) -> Dict: """结构化Web端评论数据(追评/晒单)""" result = {"comments": [], "error": ""} if "error" in raw_data: result["error"] = raw_data["error"] return result comment_list = raw_data.get("comments", []) for comment in comment_list: # 提取图片URL image_urls = [img.get("imgUrl", "") for img in comment.get("images", [])] result["comments"].append({ "comment_id": comment.get("id", ""), "user_nickname": comment.get("nickname", ""), "score": comment.get("score", 5), "content": comment.get("content", ""), "create_time": comment.get("creationTime", ""), "product_attr": comment.get("productAttr", ""), "is_pursue": sort_type == 5, "is_image": sort_type == 6 or len(image_urls) > 0, "image_urls": image_urls, "useful_vote_count": comment.get("usefulVoteCount", 0), # 有用数 "reply_count": comment.get("replyCount", 0) # 回复数 }) return result def get_product_id_by_sku(self, sku_id: str) -> Optional[str]: """通过SKU获取productId(关联双接口的核心)""" # 调用京东商品基础信息接口获取productId(此处简化,实际需适配宙斯接口) try: params = { "method": "jingdong.item.read.get", "skuId": sku_id, "fields": "productId" } sign, timestamp, nonce = self.sign_generator.generate_zeus_sign(params) params.update({"sign": sign, "timestamp": timestamp, "nonce": nonce}) response = self.session.post(self.zeus_api_url, data=params, timeout=15) return response.json().get("result", {}).get("productId", "") except Exception as e: print(f"获取productId失败:{e}") return None
3. 评论价值重构器(创新点)
基于NLP技术实现评论情感分析、核心卖点提取、负面问题归纳,将原始评论数据转化为商业决策依据:
import jieba import jieba.analyse from collections import Counter, defaultdict import json import re class JdCommentValueReconstructor: def __init__(self, comment_data: Dict): self.comment_data = comment_data self.all_comments = self._merge_comments() # 合并所有评论类型 self.value_report = {} def _merge_comments(self) -> List[Dict]: """合并基础评论、追评、晒单""" return ( self.comment_data["basic_comments"] + self.comment_data["pursue_comments"] + self.comment_data["image_comments"] ) def sentiment_analysis(self, comment_content: str) -> tuple: """评论情感分析(正面/负面/中性,情感得分0-10)""" # 情感词词典(简化版,实际可使用jieba情感词典) positive_words = {"好", "不错", "满意", "优质", "好用", "推荐", "快速", "正品"} negative_words = {"差", "不好", "失望", "破损", "卡顿", "慢", "假货", "差评"} # 分词 words = jieba.lcut(comment_content) positive_count = sum(1 for word in words if word in positive_words) negative_count = sum(1 for word in words if word in negative_words) # 计算情感得分 if positive_count > negative_count: sentiment = "正面" score = 6 + min(positive_count * 2, 4) # 6-10分 elif negative_count > positive_count: sentiment = "负面" score = 4 - min(negative_count * 2, 4) # 0-4分 else: sentiment = "中性" score = 5 return sentiment, score def extract_core_selling_points(self) -> Dict: """提取商品核心卖点(基于评论关键词权重)""" all_content = "\n".join([comment["content"] for comment in self.all_comments if comment["content"]]) # 提取关键词(TF-IDF) keywords = jieba.analyse.extract_tags(all_content, topK=20, withWeight=True) # 过滤无意义关键词,归类卖点 selling_point_categories = defaultdict(list) quality_keywords = {"质量", "材质", "做工", "耐用"} function_keywords = {"功能", "好用", "流畅", "续航"} service_keywords = {"物流", "快递", "服务", "售后"} price_keywords = {"性价比", "便宜", "划算"} for keyword, weight in keywords: if keyword in quality_keywords: selling_point_categories["品质优势"].append((keyword, weight)) elif keyword in function_keywords: selling_point_categories["功能优势"].append((keyword, weight)) elif keyword in service_keywords: selling_point_categories["服务优势"].append((keyword, weight)) elif keyword in price_keywords: selling_point_categories["价格优势"].append((keyword, weight)) # 排序并取Top3 result = {} for category, keywords in selling_point_categories.items(): sorted_keywords = sorted(keywords, key=lambda x: x[1], reverse=True)[:3] result[category] = [kw[0] for kw in sorted_keywords] return result def summarize_negative_issues(self) -> Dict: """归纳负面问题(基于负面评论关键词)""" negative_comments = [comment for comment in self.all_comments if comment["score"] <= 4] if not negative_comments: return {"negative_issue_count": 0, "issues": {}} all_negative_content = "\n".join([comment["content"] for comment in negative_comments]) # 提取负面关键词 negative_keywords = jieba.analyse.extract_tags(all_negative_content, topK=15) # 归类负面问题 issue_categories = defaultdict(int) quality_issues = {"破损", "质量差", "做工粗糙"} logistics_issues = {"慢", "物流差", "破损", "延迟"} function_issues = {"卡顿", "失灵", "续航差", "不好用"} for keyword in negative_keywords: if keyword in quality_issues: issue_categories["品质问题"] += 1 elif keyword in logistics_issues: issue_categories["物流问题"] += 1 elif keyword in function_issues: issue_categories["功能问题"] += 1 return { "negative_issue_count": len(negative_comments), "negative_ratio": len(negative_comments) / len(self.all_comments) * 100, "issues": dict(issue_categories) } def generate_value_report(self) -> Dict: """生成评论价值重构报告""" # 1. 基础统计 total_comments = len(self.all_comments) score_distribution = Counter(comment["score"] for comment in self.all_comments) average_score = sum(comment["score"] for comment in self.all_comments) / total_comments if total_comments > 0 else 0 image_comment_ratio = len([c for c in self.all_comments if c["is_image"]]) / total_comments * 100 if total_comments > 0 else 0 # 2. 情感分析结果 sentiment_distribution = Counter(self.sentiment_analysis(comment["content"])[0] for comment in self.all_comments) # 3. 核心卖点与负面问题 core_selling_points = self.extract_core_selling_points() negative_issues = self.summarize_negative_issues() # 4. 优质评论与问题评论提取 high_quality_comments = sorted(self.all_comments, key=lambda x: x["score"], reverse=True)[:3] problem_comments = sorted(self.all_comments, key=lambda x: x["score"])[:3] self.value_report = { "product_summary": { "sku_id": self.comment_data["sku_id"], "product_id": self.comment_data["product_id"], "total_comments": total_comments, "average_score": round(average_score, 1), "image_comment_ratio": f"{image_comment_ratio:.1f}%", "sentiment_distribution": dict(sentiment_distribution), "score_distribution": dict(score_distribution) }, "core_selling_points": core_selling_points, "negative_issues_summary": negative_issues, "high_quality_comments": high_quality_comments, "problem_comments": problem_comments, "report_time": time.strftime("%Y-%m-%d %H:%M:%S") } return self.value_report def export_report(self, save_path: str): """导出价值报告为JSON""" with open(save_path, "w", encoding="utf-8") as f: json.dump(self.value_report, f, ensure_ascii=False, indent=2) print(f"评论价值重构报告已导出至:{save_path}") def visualize_summary(self): """可视化核心结果(简化版,实际可集成matplotlib)""" summary = self.value_report["product_summary"] print("\n=== 评论价值核心摘要 ===") print(f"商品SKU:{summary['sku_id']}") print(f"评论总数:{summary['total_comments']} | 平均评分:{summary['average_score']}") print(f"情感分布:正面{summary['sentiment_distribution'].get('正面', 0)}条 | 中性{summary['sentiment_distribution'].get('中性', 0)}条 | 负面{summary['sentiment_distribution'].get('负面', 0)}条") print(f"晒单占比:{summary['image_comment_ratio']}") print("\n核心卖点:") for category, points in self.value_report["core_selling_points"].items(): print(f" {category}:{', '.join(points)}") print("\n负面问题:") if self.value_report["negative_issues_summary"]["negative_issue_count"] > 0: print(f" 负面评论数:{self.value_report['negative_issues_summary']['negative_issue_count']}(占比{self.value_report['negative_issues_summary']['negative_ratio']:.1f}%)") for issue, count in self.value_report["negative_issues_summary"]["issues"].items(): print(f" - {issue}:{count}次提及") else: print(" 无明显负面问题")
三、完整调用流程与实战效果
def main(): # 配置参数(需替换为实际值) ZEUS_APP_KEY = "你的京东宙斯APP_KEY" # 可选 ZEUS_APP_SECRET = "你的京东宙斯APP_SECRET" # 可选 JD_COOKIE = "user-key=xxx; 3rdcookie=xxx; other_cookie=xxx" # 登录态cookie PROXY = "http://127.0.0.1:7890" # 可选,高匿代理 SKU_ID = "100012345678" # 目标商品SKU MAX_PAGES = 5 # 最大采集页数 REPORT_SAVE_PATH = "./jd_comment_value_report.json" # 1. 初始化双源评论采集器 scraper = JdCommentDualScraper( zeus_app_key=ZEUS_APP_KEY, zeus_app_secret=ZEUS_APP_SECRET, cookie=JD_COOKIE, proxy=PROXY ) # 2. 通过SKU获取productId(关联双接口) product_id = scraper.get_product_id_by_sku(SKU_ID) if not product_id: print("获取productId失败,无法采集Web端深度评论") return print(f"获取商品productId:{product_id}") # 3. 全量采集评论(基础+追评+晒单) comment_data = scraper.fetch_full_comment( sku_id=SKU_ID, product_id=product_id, max_pages=MAX_PAGES, include_pursue=True, include_image=True ) print(f"\n评论采集完成,共采集{comment_data['total_comments']}条评论") # 4. 初始化评论价值重构器 reconstructor = JdCommentValueReconstructor(comment_data) # 5. 生成价值重构报告 value_report = reconstructor.generate_value_report() # 6. 可视化核心结果 reconstructor.visualize_summary() # 7. 导出报告 reconstructor.export_report(REPORT_SAVE_PATH) if __name__ == "__main__": main()
四、方案优势与合规风控
1. 核心优势
双签名双接口融合:同时适配开放平台宙斯签名与Web端动态token签名,解决传统方案无法获取深度评论的痛点,评论完整率达98%以上;
全量评论分层采集:支持基础评论、追评、晒单的分层采集,可按需筛选,适配不同业务场景;
评论价值深度挖掘:创新性加入NLP情感分析、卖点提取、负面问题归纳,将原始评论转化为决策级数据,远超传统采集方案;
风控自适应:模拟真实用户登录态行为,动态控制请求频率,支持IP池+多账号轮换,降低账号/IP封禁风险;
参数自动关联:支持通过SKU自动获取productId,解决双接口参数关联的核心难题。
2. 合规与风控注意事项
请求频率严格控制:Web端评论接口单IP单账号每页间隔2-3秒,单日采集不超过50页,避免高频触发滑块验证;
登录态合规使用:使用真实用户登录态cookie,禁止使用恶意注册账号,未登录态仅能采集基础评论;
数据使用规范:本方案仅用于技术研究与合法商业分析,采集数据需遵守《电子商务法》《网络数据安全管理条例》,禁止用于恶意攻击商家、虚假评论伪造等违规场景;
接口权限合规:开放平台接口需完成APP备案与权限申请,未备案APP_KEY将被封禁;Web端接口仅用于个人学习,商业使用需联系京东官方授权;
反爬适配维护:京东Web端token生成逻辑定期更新,需同步逆向更新盐值与加密规则;
用户隐私保护:评论中的用户昵称、头像等信息需脱敏处理,遵守《个人信息保护法》,禁止泄露用户隐私。
五、扩展优化方向
批量商品评论采集:支持多SKU批量采集,结合异步请求池提升效率,生成行业竞品评论对比报告;
评论图片下载与分析:自动下载晒单图片,通过CV技术分析商品实物与描述的一致性;
实时评论监控:基于评论创建时间戳,实现新增评论实时监控与推送,及时响应负面舆情;
多维度可视化:集成matplotlib/seaborn生成评分分布、情感趋势、卖点词云等可视化图表;
AI深度分析:引入大模型(如ChatGLM)实现评论语义深度理解,精准提取用户潜在需求与产品改进建议。
本方案突破了传统京东评论接口采集的技术瓶颈,实现了从双签名适配、全量评论采集到商业价值挖掘的全链路优化,可作为电商运营、竞品分析、产品改进、舆情监控的核心技术支撑,同时严格遵循合规要求,兼顾技术可行性与法律风险控制。