爬虫逆向——RPC技术
学习目标:
- 了解 websocket 协议
- 熟悉 websocket 实现原理
- 掌握 RPC 启用和注入方式
RPC,英文 RangPaCong,中文让爬虫,旨在为爬虫开路,秒杀一切,让爬虫畅通无阻!
WebSocket的出现,使得浏览器具备了实时双向通信的能力。
参考:https://blog.csdn.net/ningmengjing_/article/details/131693721?spm=1011.2415.3001.5331
参考:https://blog.csdn.net/ningmengjing_/article/details/131693687?spm=1011.2415.3001.5331
参考:https://www.cnblogs.com/chyingp/p/websocket-deep-in.html
一、websocket
1.websocket介绍与原理
WebSocket 是 HTML5 提出的一种基于 TCP 协议的全双工通信协议,它实现了浏览器与服务器之间的持久化连接,能够更高效地节省服务器资源和带宽,并提供实时通信能力。
WebSocket 通过一套特定的握手机制建立连接,使客户端和服务器之间可以建立一个类似于 TCP 的连接,从而支持双向实时通信。在 WebSocket 出现之前,Web 应用通常依赖 HTTP 协议的短连接或长连接进行通信,而 WebSocket 是一种独立于 HTTP 的无状态协议,其协议标识为“ws”。
连接建立过程简要如下:
-
客户端发起一个 HTTP 请求,该请求经过三次握手建立 TCP 连接。请求头中包含 Upgrade、Connection、WebSocket-Version 等字段,表明希望升级到 WebSocket 协议;
-
服务器确认支持 WebSocket 后,返回 HTTP 响应完成握手;
-
握手成功后,客户端和服务器即可基于已建立的 TCP 连接进行全双工通信。
2.websocket实现方式
(1)客户端
<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><title>Title</title>
</head>
<body>
<input type="text" id="box">
<button onclick="ps()">发送</button><script><!-- 创建链接 -->var ws = new WebSocket('ws://127.0.0.1:8000')// 执行失败的回调ws.onerror = function () {console.log('链接失败')}// 链接成功的回调// ws.onopen// 接收消息的回调方法ws.onmessage = function (event) {console.log(event.data)}// 关闭链接// ws.onclosefunction ps() {var text = document.getElementById('box').value// 发送数据ws.send(text)}
</script>
</body>
</html>
(2)服务端
import asyncio
import websocketsasync def echo(ws):await ws.send('hello')return Trueasync def recv_msg(ws):while 1:data = await ws.recv()print(data)async def main(ws,path):await echo(ws)await recv_msg(ws)if __name__ == '__main__':loop = asyncio.get_event_loop()ser = websockets.serve(main,'127.0.0.1',8000)loop.run_until_complete(ser)loop.run_forever()
(3)实际案例
案例目标
- 网址:https://jzsc.mohurd.gov.cn/data/company
- 需求:通过
websocket
解析加密数据 - 实际注入网站代码
!(function () {if (window.flag) {} else {const websocket = new WebSocket('ws://127.0.0.1:8000')// 创建一个标记用来判断是否创建套接字window.flag = true;// 接收服务端发送的信息websocket.onmessage = function (event) {var data = event.data// 调用js解密var res = b(data)console.log(res)// 发送解密数据给服务端websocket.send(res)}}}())
解题分析
- 定位到加密位置
- 将我们写的
websocket
命令注入到代码当中(通过替换的方式实现)
- 注入之后需要刷新页面才能把js 执行起来
- python执行代码
# encoding: utf-8
import asyncio
import websockets
import requests
import time
import jsondef get_data(page):headers = {"v": "231012","Referer": "https://jzsc.mohurd.gov.cn/data/company","User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/118.0.0.0 Safari/537.36",}url = "https://jzsc.mohurd.gov.cn/APi/webApi/dataservice/query/comp/list"params = {"pg": page,"pgsz": "15","total": "450"}response = requests.get(url, headers=headers, params=params)print(response.text)return response.textasync def echo(websocket):for i in range(1, 4):data = get_data(i)await websocket.send(data)# time.sleep(2)# return Trueasync def recv_msg(websocket):while 1:# 接收数据recv_text = await websocket.recv()print(json.loads(recv_text))async def main_logic(websocket, path):await echo(websocket)await recv_msg(websocket)start_server = websockets.serve(main_logic, '127.0.0.1', 8080)
loop = asyncio.get_event_loop()
loop.run_until_complete(start_server)
# 创建了一个连接对象之后,需要不断监听返回的数据,则调用 run_forever 方法,要保持长连接即可
loop.run_forever()
二、RPC
1.RPC简介
RPC(Remote Procedure Call,远程过程调用)是一种通过网络从远程计算机程序中请求服务,而不需要了解底层网络细节的技术。在使用 WebSocket 等协议进行通信时,通常需要手动建立连接、维护会话并处理数据传输,过程较为繁琐。而 RPC 可以帮助我们封装这些底层操作,直接调用远程服务暴露的接口,极大简化开发流程。
RPC 技术的本质是实现进程间通信,允许一个进程调用另一个进程(往往位于不同的机器或环境中)中的方法或函数。由于其适用于微服务、分布式系统等场景,RPC 在架构设计中广泛使用。尽管 RPC 本身技术复杂度较高,但在爬虫和逆向领域中,我们无需深入其全部细节,重点在于掌握如何借助 RPC 实现高效的数据交互和函数调用。
在逆向工程中,RPC 的一个典型应用是将浏览器环境与本地代码视为服务端和客户端,通过 WebSocket 等协议建立 RPC 通信通道。这样一来,可以在浏览器中暴露加密函数等关键方法,供本地代码直接远程调用。这种方式避免了复杂算法还原、代码扣取和环境补全等操作,显著节省逆向分析的时间成本。
2.Sekiro-RPC
Sekiro-RPC 是一个基于 WebSocket 实现的轻量级 RPC 框架,主要用于浏览器环境与本地服务之间的远程调用。官网文档地址:https://sekiro.iinti.cn/sekiro-doc/
(1)使用方法
1. 启动服务端(本地)
需预先安装 Java 运行环境(JRE)。若未安装,可参考:JDK 8 安装配置指南
JDK 下载地址(华为镜像站):https://repo.huaweicloud.com/java/jdk/8u201-b09/
启动方式:
-
Linux & Mac:执行
bin/sekiro.sh
-
Windows:双击运行
bin/sekiro.bat
2. 客户端环境
在前端中引入 Sekiro 客户端脚本,用于建立与 Sekiro 服务端的通信:
https://file.virjar.com/sekiro_web_client.js?_=123
3. 使用原理与参数说明
核心机制是通过注入浏览器环境的客户端脚本(SekiroClient),与本地启动的 Sekiro 服务端建立通信,从而实现对浏览器内部方法的 RPC 调用。开发者可通过官方提供的 SekiroClient 代码样例,快速实现远程调用功能。
// 生成唯一标记uuid编号
function guid() {function S4() {return (((1 + Math.random()) * 0x10000) | 0).toString(16).substring(1)}return (S4() + S4() + "-" + S4() + "-" + S4() + "-" + S4() + "-" + S4() + S4() + S4());
}// 连接服务端
var client = new SekiroClient("ws://127.0.0.1:5620/business-demo/register?group=ws-group&clientId=" + guid());
// 业务接口
client.registerAction("登陆", function (request, resolve, reject) {resolve("" + new Date());
})
-
Group(业务组):用于区分不同的业务类型或接口组。每个 Group 下可注册多个终端(SekiroClient),并可挂载多个 Action(接口),实现对不同业务功能的逻辑隔离与管理。
-
ClientId(设备标识):用于唯一标识一个终端设备。支持多设备同时注册,具备负载均衡与群控能力,适用于需要多机协同提供 API 服务的场景。
-
SekiroClient(服务提供端):运行在浏览器、手机等环境中的客户端,作为实际的服务提供者。每个客户端需具有唯一的 ClientId,Sekiro 服务端会将调用请求转发至相应的 SekiroClient 执行。
-
registerAction(接口注册):在同一个 Group 下可注册多个 Action,每个 Action 对应一个具体的功能接口,实现不同业务操作的分离与调用。
-
request(请求对象):代表从服务端接收到的调用请求,其中包含调用参数。可通过键值对形式提取参数,供业务逻辑处理使用。
-
resolve(结果回传):用于将处理结果返回给服务端的方法,确保调用方能及时接收到响应数据。
(2)测试使用
1.前端代码:
<!DOCTYPE html>
<html lang="en">
<head><meta charset="UTF-8"><title>Title</title>
</head>
<body><script>(function () {function SekiroClient(wsURL) {this.wsURL = wsURL;this.handlers = {};this.socket = {};this.base64 = false;// checkif (!wsURL) {throw new Error('wsURL can not be empty!!')}this.webSocketFactory = this.resolveWebSocketFactory();this.connect()}SekiroClient.prototype.resolveWebSocketFactory = function () {if (typeof window === 'object') {var theWebSocket = window.WebSocket ? window.WebSocket : window.MozWebSocket;return function (wsURL) {function WindowWebSocketWrapper(wsURL) {this.mSocket = new theWebSocket(wsURL);}WindowWebSocketWrapper.prototype.close = function () {this.mSocket.close();};WindowWebSocketWrapper.prototype.onmessage = function (onMessageFunction) {this.mSocket.onmessage = onMessageFunction;};WindowWebSocketWrapper.prototype.onopen = function (onOpenFunction) {this.mSocket.onopen = onOpenFunction;};WindowWebSocketWrapper.prototype.onclose = function (onCloseFunction) {this.mSocket.onclose = onCloseFunction;};WindowWebSocketWrapper.prototype.send = function (message) {this.mSocket.send(message);};return new WindowWebSocketWrapper(wsURL);}}if (typeof weex === 'object') {// this is weex env : https://weex.apache.org/zh/docs/modules/websockets.htmltry {console.log("test webSocket for weex");var ws = weex.requireModule('webSocket');console.log("find webSocket for weex:" + ws);return function (wsURL) {try {ws.close();} catch (e) {}ws.WebSocket(wsURL, '');return ws;}} catch (e) {console.log(e);//ignore}}//TODO support ReactNativeif (typeof WebSocket === 'object') {return function (wsURL) {return new theWebSocket(wsURL);}}// weex 鍜� PC鐜鐨剋ebsocket API涓嶅畬鍏ㄤ竴鑷达紝鎵€浠ュ仛浜嗘娊璞″吋瀹�throw new Error("the js environment do not support websocket");};SekiroClient.prototype.connect = function () {console.log('sekiro: begin of connect to wsURL: ' + this.wsURL);var _this = this;// 涓峜heck close锛岃// if (this.socket && this.socket.readyState === 1) {// this.socket.close();// }try {this.socket = this.webSocketFactory(this.wsURL);} catch (e) {console.log("sekiro: create connection failed,reconnect after 2s");setTimeout(function () {_this.connect()}, 2000)}this.socket.onmessage(function (event) {_this.handleSekiroRequest(event.data)});this.socket.onopen(function (event) {console.log('sekiro: open a sekiro client connection')});this.socket.onclose(function (event) {console.log('sekiro: disconnected ,reconnection after 2s');setTimeout(function () {_this.connect()}, 2000)});};SekiroClient.prototype.handleSekiroRequest = function (requestJson) {console.log("receive sekiro request: " + requestJson);var request = JSON.parse(requestJson);var seq = request['__sekiro_seq__'];if (!request['action']) {this.sendFailed(seq, 'need request param {action}');return}var action = request['action'];if (!this.handlers[action]) {this.sendFailed(seq, 'no action handler: ' + action + ' defined');return}var theHandler = this.handlers[action];var _this = this;try {theHandler(request, function (response) {try {_this.sendSuccess(seq, response)} catch (e) {_this.sendFailed(seq, "e:" + e);}}, function (errorMessage) {_this.sendFailed(seq, errorMessage)})} catch (e) {console.log("error: " + e);_this.sendFailed(seq, ":" + e);}};SekiroClient.prototype.sendSuccess = function (seq, response) {var responseJson;if (typeof response == 'string') {try {responseJson = JSON.parse(response);} catch (e) {responseJson = {};responseJson['data'] = response;}} else if (typeof response == 'object') {responseJson = response;} else {responseJson = {};responseJson['data'] = response;}if (typeof response == 'string') {responseJson = {};responseJson['data'] = response;}if (Array.isArray(responseJson)) {responseJson = {data: responseJson,code: 0}}if (responseJson['code']) {responseJson['code'] = 0;} else if (responseJson['status']) {responseJson['status'] = 0;} else {responseJson['status'] = 0;}responseJson['__sekiro_seq__'] = seq;var responseText = JSON.stringify(responseJson);console.log("response :" + responseText);if (responseText.length < 1024 * 6) {this.socket.send(responseText);return;}if (this.base64) {responseText = this.base64Encode(responseText)}//澶ф姤鏂囪鍒嗘浼犺緭var segmentSize = 1024 * 5;var i = 0, totalFrameIndex = Math.floor(responseText.length / segmentSize) + 1;for (; i < totalFrameIndex; i++) {var frameData = JSON.stringify({__sekiro_frame_total: totalFrameIndex,__sekiro_index: i,__sekiro_seq__: seq,__sekiro_base64: this.base64,__sekiro_is_frame: true,__sekiro_content: responseText.substring(i * segmentSize, (i + 1) * segmentSize)});console.log("frame: " + frameData);this.socket.send(frameData);}};SekiroClient.prototype.sendFailed = function (seq, errorMessage) {if (typeof errorMessage != 'string') {errorMessage = JSON.stringify(errorMessage);}var responseJson = {};responseJson['message'] = errorMessage;responseJson['status'] = -1;responseJson['__sekiro_seq__'] = seq;var responseText = JSON.stringify(responseJson);console.log("sekiro: response :" + responseText);this.socket.send(responseText)};SekiroClient.prototype.registerAction = function (action, handler) {if (typeof action !== 'string') {throw new Error("an action must be string");}if (typeof handler !== 'function') {throw new Error("a handler must be function");}console.log("sekiro: register action: " + action);this.handlers[action] = handler;return this;};SekiroClient.prototype.encodeWithBase64 = function () {this.base64 = arguments && arguments.length > 0 && arguments[0];};SekiroClient.prototype.base64Encode = function (s) {if (arguments.length !== 1) {throw "SyntaxError: exactly one argument required";}s = String(s);if (s.length === 0) {return s;}function _get_chars(ch, y) {if (ch < 0x80) y.push(ch);else if (ch < 0x800) {y.push(0xc0 + ((ch >> 6) & 0x1f));y.push(0x80 + (ch & 0x3f));} else {y.push(0xe0 + ((ch >> 12) & 0xf));y.push(0x80 + ((ch >> 6) & 0x3f));y.push(0x80 + (ch & 0x3f));}}var _PADCHAR = "=",_ALPHA = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/",_VERSION = "1.1";//Mr. Ruan fix to 1.1 to support asian char(utf8)//s = _encode_utf8(s);var i,b10,y = [],x = [],len = s.length;i = 0;while (i < len) {_get_chars(s.charCodeAt(i), y);while (y.length >= 3) {var ch1 = y.shift();var ch2 = y.shift();var ch3 = y.shift();b10 = (ch1 << 16) | (ch2 << 8) | ch3;x.push(_ALPHA.charAt(b10 >> 18));x.push(_ALPHA.charAt((b10 >> 12) & 0x3F));x.push(_ALPHA.charAt((b10 >> 6) & 0x3f));x.push(_ALPHA.charAt(b10 & 0x3f));}i++;}switch (y.length) {case 1:var ch = y.shift();b10 = ch << 16;x.push(_ALPHA.charAt(b10 >> 18) + _ALPHA.charAt((b10 >> 12) & 0x3F) + _PADCHAR + _PADCHAR);break;case 2:var ch1 = y.shift();var ch2 = y.shift();b10 = (ch1 << 16) | (ch2 << 8);x.push(_ALPHA.charAt(b10 >> 18) + _ALPHA.charAt((b10 >> 12) & 0x3F) + _ALPHA.charAt((b10 >> 6) & 0x3f) + _PADCHAR);break;}return x.join("");};function startRpc() {if (window.flag) {} else {function guid() {function S4() {return (((1 + Math.random()) * 0x10000) | 0).toString(16).substring(1);}return (S4() + S4() + "-" + S4() + "-" + S4() + "-" + S4() + "-" + S4() + S4() + S4());}// 创建一个标记用来判断是否创建套接字window.flag = true;var client = new SekiroClient("ws://127.0.0.1:5620/business-demo/register?group=rpc-test&clientId=" + guid());client.registerAction("ths", function (request, resolve, reject) {resolve(rt.update());})}}setTimeout(startRpc, 1000)})()function guid() {function S4() {return (((1 + Math.random()) * 0x10000) | 0).toString(16).substring(1);}return (S4() + S4() + "-" + S4() + "-" + S4() + "-" + S4() + "-" + S4() + S4() + S4());}var client = new SekiroClient("ws://127.0.0.1:5620/business-demo/register?group=rpc-test&clientId=" + guid());client.registerAction("clientTime", function (request, resolve, reject) {resolve("" + new Date());})</script>
</body>
</html>
2.SK API
Sekiro
为我们提供了一些 API
- 查看分组列表:http://127.0.0.1:5620/business-demo/groupList
- 查看队列状态:http://127.0.0.1:5620/business-demo/clientQueue?group=rpc-test
- 调用转发:http://127.0.0.1:5620/business-demo/invoke?group=rpc-test&action=clientTime
3.python调用
import requestsparams = {"group": "rpc-test","action": "clientTime",
}
res = requests.get("http://127.0.0.1:5620/business-demo/invoke", params=params)
print(res.text)
三、案例
案例一:
1.逆向目标
地址:http://q.10jqka.com.cn/
加密参数:
2.逆向分析
hook定位加密位置:
// hook cookie
(function(){cookie_val=document.cookieObject.defineProperty(document,'cookie',{set:function(new_val){debuggercookie_data=new_val},get:function(){return cookie_val},})
})()
3.代码实现
要插入的代码:
(function () {function SekiroClient(wsURL) {this.wsURL = wsURL;this.handlers = {};this.socket = {};this.base64 = false;// checkif (!wsURL) {throw new Error('wsURL can not be empty!!')}this.webSocketFactory = this.resolveWebSocketFactory();this.connect()}SekiroClient.prototype.resolveWebSocketFactory = function () {if (typeof window === 'object') {var theWebSocket = window.WebSocket ? window.WebSocket : window.MozWebSocket;return function (wsURL) {function WindowWebSocketWrapper(wsURL) {this.mSocket = new theWebSocket(wsURL);}WindowWebSocketWrapper.prototype.close = function () {this.mSocket.close();};WindowWebSocketWrapper.prototype.onmessage = function (onMessageFunction) {this.mSocket.onmessage = onMessageFunction;};WindowWebSocketWrapper.prototype.onopen = function (onOpenFunction) {this.mSocket.onopen = onOpenFunction;};WindowWebSocketWrapper.prototype.onclose = function (onCloseFunction) {this.mSocket.onclose = onCloseFunction;};WindowWebSocketWrapper.prototype.send = function (message) {this.mSocket.send(message);};return new WindowWebSocketWrapper(wsURL);}}if (typeof weex === 'object') {// this is weex env : https://weex.apache.org/zh/docs/modules/websockets.htmltry {console.log("test webSocket for weex");var ws = weex.requireModule('webSocket');console.log("find webSocket for weex:" + ws);return function (wsURL) {try {ws.close();} catch (e) {}ws.WebSocket(wsURL, '');return ws;}} catch (e) {console.log(e);//ignore}}//TODO support ReactNativeif (typeof WebSocket === 'object') {return function (wsURL) {return new theWebSocket(wsURL);}}// weex 鍜� PC鐜鐨剋ebsocket API涓嶅畬鍏ㄤ竴鑷达紝鎵€浠ュ仛浜嗘娊璞″吋瀹�throw new Error("the js environment do not support websocket");};SekiroClient.prototype.connect = function () {console.log('sekiro: begin of connect to wsURL: ' + this.wsURL);var _this = this;// 涓峜heck close锛岃// if (this.socket && this.socket.readyState === 1) {// this.socket.close();// }try {this.socket = this.webSocketFactory(this.wsURL);} catch (e) {console.log("sekiro: create connection failed,reconnect after 2s");setTimeout(function () {_this.connect()}, 2000)}this.socket.onmessage(function (event) {_this.handleSekiroRequest(event.data)});this.socket.onopen(function (event) {console.log('sekiro: open a sekiro client connection')});this.socket.onclose(function (event) {console.log('sekiro: disconnected ,reconnection after 2s');setTimeout(function () {_this.connect()}, 2000)});};SekiroClient.prototype.handleSekiroRequest = function (requestJson) {console.log("receive sekiro request: " + requestJson);var request = JSON.parse(requestJson);var seq = request['__sekiro_seq__'];if (!request['action']) {this.sendFailed(seq, 'need request param {action}');return}var action = request['action'];if (!this.handlers[action]) {this.sendFailed(seq, 'no action handler: ' + action + ' defined');return}var theHandler = this.handlers[action];var _this = this;try {theHandler(request, function (response) {try {_this.sendSuccess(seq, response)} catch (e) {_this.sendFailed(seq, "e:" + e);}}, function (errorMessage) {_this.sendFailed(seq, errorMessage)})} catch (e) {console.log("error: " + e);_this.sendFailed(seq, ":" + e);}};SekiroClient.prototype.sendSuccess = function (seq, response) {var responseJson;if (typeof response == 'string') {try {responseJson = JSON.parse(response);} catch (e) {responseJson = {};responseJson['data'] = response;}} else if (typeof response == 'object') {responseJson = response;} else {responseJson = {};responseJson['data'] = response;}if (typeof response == 'string') {responseJson = {};responseJson['data'] = response;}if (Array.isArray(responseJson)) {responseJson = {data: responseJson,code: 0}}if (responseJson['code']) {responseJson['code'] = 0;} else if (responseJson['status']) {responseJson['status'] = 0;} else {responseJson['status'] = 0;}responseJson['__sekiro_seq__'] = seq;var responseText = JSON.stringify(responseJson);console.log("response :" + responseText);if (responseText.length < 1024 * 6) {this.socket.send(responseText);return;}if (this.base64) {responseText = this.base64Encode(responseText)}//澶ф姤鏂囪鍒嗘浼犺緭var segmentSize = 1024 * 5;var i = 0, totalFrameIndex = Math.floor(responseText.length / segmentSize) + 1;for (; i < totalFrameIndex; i++) {var frameData = JSON.stringify({__sekiro_frame_total: totalFrameIndex,__sekiro_index: i,__sekiro_seq__: seq,__sekiro_base64: this.base64,__sekiro_is_frame: true,__sekiro_content: responseText.substring(i * segmentSize, (i + 1) * segmentSize)});console.log("frame: " + frameData);this.socket.send(frameData);}};SekiroClient.prototype.sendFailed = function (seq, errorMessage) {if (typeof errorMessage != 'string') {errorMessage = JSON.stringify(errorMessage);}var responseJson = {};responseJson['message'] = errorMessage;responseJson['status'] = -1;responseJson['__sekiro_seq__'] = seq;var responseText = JSON.stringify(responseJson);console.log("sekiro: response :" + responseText);this.socket.send(responseText)};SekiroClient.prototype.registerAction = function (action, handler) {if (typeof action !== 'string') {throw new Error("an action must be string");}if (typeof handler !== 'function') {throw new Error("a handler must be function");}console.log("sekiro: register action: " + action);this.handlers[action] = handler;return this;};SekiroClient.prototype.encodeWithBase64 = function () {this.base64 = arguments && arguments.length > 0 && arguments[0];};SekiroClient.prototype.base64Encode = function (s) {if (arguments.length !== 1) {throw "SyntaxError: exactly one argument required";}s = String(s);if (s.length === 0) {return s;}function _get_chars(ch, y) {if (ch < 0x80) y.push(ch);else if (ch < 0x800) {y.push(0xc0 + ((ch >> 6) & 0x1f));y.push(0x80 + (ch & 0x3f));} else {y.push(0xe0 + ((ch >> 12) & 0xf));y.push(0x80 + ((ch >> 6) & 0x3f));y.push(0x80 + (ch & 0x3f));}}var _PADCHAR = "=",_ALPHA = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/",_VERSION = "1.1";//Mr. Ruan fix to 1.1 to support asian char(utf8)//s = _encode_utf8(s);var i,b10,y = [],x = [],len = s.length;i = 0;while (i < len) {_get_chars(s.charCodeAt(i), y);while (y.length >= 3) {var ch1 = y.shift();var ch2 = y.shift();var ch3 = y.shift();b10 = (ch1 << 16) | (ch2 << 8) | ch3;x.push(_ALPHA.charAt(b10 >> 18));x.push(_ALPHA.charAt((b10 >> 12) & 0x3F));x.push(_ALPHA.charAt((b10 >> 6) & 0x3f));x.push(_ALPHA.charAt(b10 & 0x3f));}i++;}switch (y.length) {case 1:var ch = y.shift();b10 = ch << 16;x.push(_ALPHA.charAt(b10 >> 18) + _ALPHA.charAt((b10 >> 12) & 0x3F) + _PADCHAR + _PADCHAR);break;case 2:var ch1 = y.shift();var ch2 = y.shift();b10 = (ch1 << 16) | (ch2 << 8);x.push(_ALPHA.charAt(b10 >> 18) + _ALPHA.charAt((b10 >> 12) & 0x3F) + _ALPHA.charAt((b10 >> 6) & 0x3f) + _PADCHAR);break;}return x.join("");};if (window.flag) {} else {function guid() {function S4() {return (((1 + Math.random()) * 0x10000) | 0).toString(16).substring(1);}return (S4() + S4() + "-" + S4() + "-" + S4() + "-" + S4() + "-" + S4() + S4() + S4());}// 创建一个标记用来判断是否创建套接字window.flag = true;var client = new SekiroClient("ws://127.0.0.1:5620/business-demo/register?group=ths&clientId=" + guid());client.registerAction("get_cookie", function (request, resolve, reject) {resolve(qn.update());})}})()
下图表示注入成功
python 代码:
import requests
from lxml import etree
import csv
import os
import time
class TongHuaShun():def __init__(self):self.headers = {"accept": "text/html, */*; q=0.01","accept-language": "zh-CN,zh;q=0.9","cache-control": "no-cache","hexin-v": "A0zJmRO2SEen1Fy7XzZobjejHaF7hfEA8isE4KYNWSuaqOKfzpXAv0I51Ij1","pragma": "no-cache","priority": "u=1, i","referer": "https://q.10jqka.com.cn/","sec-ch-ua": "\"Chromium\";v=\"140\", \"Not=A?Brand\";v=\"24\", \"Google Chrome\";v=\"140\"","sec-ch-ua-mobile": "?0","sec-ch-ua-platform": "\"Windows\"","sec-fetch-dest": "empty","sec-fetch-mode": "cors","sec-fetch-site": "same-origin","user-agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/140.0.0.0 Safari/537.36","x-requested-with": "XMLHttpRequest"}self.url = "https://q.10jqka.com.cn/index/index/board/all/field/zdf/order/desc/page/{}/ajax/1/"self.filename='ths.csv'def get_info(self,page):# 或者cookies变化的值params = {'group': 'ths','action': 'get_cookie',}res = requests.get('http://127.0.0.1:5620/business-demo/invoke', params=params)if res.status_code != 200:print("无法从Sekiro服务获取cookie")return ""try:cookie_data = res.json()['data']except KeyError:print("返回的数据中没有预期的cookie值")return ""cookies = {"v": cookie_data}response = requests.get(self.url.format(page), headers=self.headers, cookies=cookies)return response.textdef parse_data(self,data):html=etree.HTML(data)tr_list=html.xpath('//table/tbody/tr')for i in tr_list:data_list=i.xpath('./td//text()')print(data_list)item = {'序号': data_list[0],'代码': data_list[1],'名称': data_list[2],'现价': data_list[3],'涨跌幅(%)': data_list[4],'涨跌': data_list[5],'涨速(%)': data_list[6],'换手(%)': data_list[7],'量比': data_list[8],'振幅(%)': data_list[9],'成交额': data_list[10],'流通股': data_list[11],'流通市值': data_list[12],'HR': data_list[13]}print(item)self.save(item)print(f'{data_list[2]}-----保存成功')def save(self,item):file_exists = os.path.exists(self.filename)with open('ths.csv', 'a', encoding='utf-8',newline='')as f:header = ['序号','代码','名称','现价','涨跌幅(%)','涨跌','涨速(%)','换手(%)','量比','振幅(%)','成交额','流通股','流通市值','HR']f_csv = csv.DictWriter(f, fieldnames=header)if not file_exists:f_csv.writeheader()f_csv.writerow(item)def main(self):for page in range(1,15):print(f'正在爬取第{page}页')data=self.get_info(page)self.parse_data(data)time.sleep(3)if __name__ == '__main__':ths=TongHuaShun()ths.main()
案例二:
1.逆向目标
地址:https://jzsc.mohurd.gov.cn/data/company
接口:https://jzsc.mohurd.gov.cn/APi/webApi/dataservice/query/comp/list?pg=1&pgsz=15&total=450
加密参数:
2.逆向分析
3.代码实现
这个案例需要传参数,注意如何传参数
javascirpt代码:
(function () {function SekiroClient(wsURL) {this.wsURL = wsURL;this.handlers = {};this.socket = {};this.base64 = false;// checkif (!wsURL) {throw new Error('wsURL can not be empty!!')}this.webSocketFactory = this.resolveWebSocketFactory();this.connect()}SekiroClient.prototype.resolveWebSocketFactory = function () {if (typeof window === 'object') {var theWebSocket = window.WebSocket ? window.WebSocket : window.MozWebSocket;return function (wsURL) {function WindowWebSocketWrapper(wsURL) {this.mSocket = new theWebSocket(wsURL);}WindowWebSocketWrapper.prototype.close = function () {this.mSocket.close();};WindowWebSocketWrapper.prototype.onmessage = function (onMessageFunction) {this.mSocket.onmessage = onMessageFunction;};WindowWebSocketWrapper.prototype.onopen = function (onOpenFunction) {this.mSocket.onopen = onOpenFunction;};WindowWebSocketWrapper.prototype.onclose = function (onCloseFunction) {this.mSocket.onclose = onCloseFunction;};WindowWebSocketWrapper.prototype.send = function (message) {this.mSocket.send(message);};return new WindowWebSocketWrapper(wsURL);}}if (typeof weex === 'object') {// this is weex env : https://weex.apache.org/zh/docs/modules/websockets.htmltry {console.log("test webSocket for weex");var ws = weex.requireModule('webSocket');console.log("find webSocket for weex:" + ws);return function (wsURL) {try {ws.close();} catch (e) {}ws.WebSocket(wsURL, '');return ws;}} catch (e) {console.log(e);//ignore}}//TODO support ReactNativeif (typeof WebSocket === 'object') {return function (wsURL) {return new theWebSocket(wsURL);}}// weex 鍜� PC鐜鐨剋ebsocket API涓嶅畬鍏ㄤ竴鑷达紝鎵€浠ュ仛浜嗘娊璞″吋瀹�throw new Error("the js environment do not support websocket");};SekiroClient.prototype.connect = function () {console.log('sekiro: begin of connect to wsURL: ' + this.wsURL);var _this = this;// 涓峜heck close锛岃// if (this.socket && this.socket.readyState === 1) {// this.socket.close();// }try {this.socket = this.webSocketFactory(this.wsURL);} catch (e) {console.log("sekiro: create connection failed,reconnect after 2s");setTimeout(function () {_this.connect()}, 2000)}this.socket.onmessage(function (event) {_this.handleSekiroRequest(event.data)});this.socket.onopen(function (event) {console.log('sekiro: open a sekiro client connection')});this.socket.onclose(function (event) {console.log('sekiro: disconnected ,reconnection after 2s');setTimeout(function () {_this.connect()}, 2000)});};SekiroClient.prototype.handleSekiroRequest = function (requestJson) {console.log("receive sekiro request: " + requestJson);var request = JSON.parse(requestJson);var seq = request['__sekiro_seq__'];if (!request['action']) {this.sendFailed(seq, 'need request param {action}');return}var action = request['action'];if (!this.handlers[action]) {this.sendFailed(seq, 'no action handler: ' + action + ' defined');return}var theHandler = this.handlers[action];var _this = this;try {theHandler(request, function (response) {try {_this.sendSuccess(seq, response)} catch (e) {_this.sendFailed(seq, "e:" + e);}}, function (errorMessage) {_this.sendFailed(seq, errorMessage)})} catch (e) {console.log("error: " + e);_this.sendFailed(seq, ":" + e);}};SekiroClient.prototype.sendSuccess = function (seq, response) {var responseJson;if (typeof response == 'string') {try {responseJson = JSON.parse(response);} catch (e) {responseJson = {};responseJson['data'] = response;}} else if (typeof response == 'object') {responseJson = response;} else {responseJson = {};responseJson['data'] = response;}if (typeof response == 'string') {responseJson = {};responseJson['data'] = response;}if (Array.isArray(responseJson)) {responseJson = {data: responseJson,code: 0}}if (responseJson['code']) {responseJson['code'] = 0;} else if (responseJson['status']) {responseJson['status'] = 0;} else {responseJson['status'] = 0;}responseJson['__sekiro_seq__'] = seq;var responseText = JSON.stringify(responseJson);console.log("response :" + responseText);if (responseText.length < 1024 * 6) {this.socket.send(responseText);return;}if (this.base64) {responseText = this.base64Encode(responseText)}//澶ф姤鏂囪鍒嗘浼犺緭var segmentSize = 1024 * 5;var i = 0, totalFrameIndex = Math.floor(responseText.length / segmentSize) + 1;for (; i < totalFrameIndex; i++) {var frameData = JSON.stringify({__sekiro_frame_total: totalFrameIndex,__sekiro_index: i,__sekiro_seq__: seq,__sekiro_base64: this.base64,__sekiro_is_frame: true,__sekiro_content: responseText.substring(i * segmentSize, (i + 1) * segmentSize)});console.log("frame: " + frameData);this.socket.send(frameData);}};SekiroClient.prototype.sendFailed = function (seq, errorMessage) {if (typeof errorMessage != 'string') {errorMessage = JSON.stringify(errorMessage);}var responseJson = {};responseJson['message'] = errorMessage;responseJson['status'] = -1;responseJson['__sekiro_seq__'] = seq;var responseText = JSON.stringify(responseJson);console.log("sekiro: response :" + responseText);this.socket.send(responseText)};SekiroClient.prototype.registerAction = function (action, handler) {if (typeof action !== 'string') {throw new Error("an action must be string");}if (typeof handler !== 'function') {throw new Error("a handler must be function");}console.log("sekiro: register action: " + action);this.handlers[action] = handler;return this;};SekiroClient.prototype.encodeWithBase64 = function () {this.base64 = arguments && arguments.length > 0 && arguments[0];};SekiroClient.prototype.base64Encode = function (s) {if (arguments.length !== 1) {throw "SyntaxError: exactly one argument required";}s = String(s);if (s.length === 0) {return s;}function _get_chars(ch, y) {if (ch < 0x80) y.push(ch);else if (ch < 0x800) {y.push(0xc0 + ((ch >> 6) & 0x1f));y.push(0x80 + (ch & 0x3f));} else {y.push(0xe0 + ((ch >> 12) & 0xf));y.push(0x80 + ((ch >> 6) & 0x3f));y.push(0x80 + (ch & 0x3f));}}var _PADCHAR = "=",_ALPHA = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/",_VERSION = "1.1";//Mr. Ruan fix to 1.1 to support asian char(utf8)//s = _encode_utf8(s);var i,b10,y = [],x = [],len = s.length;i = 0;while (i < len) {_get_chars(s.charCodeAt(i), y);while (y.length >= 3) {var ch1 = y.shift();var ch2 = y.shift();var ch3 = y.shift();b10 = (ch1 << 16) | (ch2 << 8) | ch3;x.push(_ALPHA.charAt(b10 >> 18));x.push(_ALPHA.charAt((b10 >> 12) & 0x3F));x.push(_ALPHA.charAt((b10 >> 6) & 0x3f));x.push(_ALPHA.charAt(b10 & 0x3f));}i++;}switch (y.length) {case 1:var ch = y.shift();b10 = ch << 16;x.push(_ALPHA.charAt(b10 >> 18) + _ALPHA.charAt((b10 >> 12) & 0x3F) + _PADCHAR + _PADCHAR);break;case 2:var ch1 = y.shift();var ch2 = y.shift();b10 = (ch1 << 16) | (ch2 << 8);x.push(_ALPHA.charAt(b10 >> 18) + _ALPHA.charAt((b10 >> 12) & 0x3F) + _ALPHA.charAt((b10 >> 6) & 0x3f) + _PADCHAR);break;}return x.join("");};if (window.flag) {} else {function guid() {function S4() {return (((1 + Math.random()) * 0x10000) | 0).toString(16).substring(1);}return (S4() + S4() + "-" + S4() + "-" + S4() + "-" + S4() + "-" + S4() + S4() + S4());}// 创建一个标记用来判断是否创建套接字window.flag = true;var client = new SekiroClient("ws://127.0.0.1:5620/business-demo/register?group=jzsc&clientId=" + guid());client.registerAction("get_data", function (request, resolve, reject) {py_data=request["data"]resolve(b(py_data));})}})()
python代码:
import requests
import pymysql
import jsonclass Jzsc():def __init__(self):self.db = pymysql.connect(host='localhost',user='root',password='123456',database='py_spider') # 数据库名字# 使用cursor()方法获取操作游标self.cursor = self.db.cursor()self.headers = {"Accept": "application/json, text/plain, */*","Accept-Language": "zh-CN,zh;q=0.9","Cache-Control": "no-cache","Connection": "keep-alive","Pragma": "no-cache","Referer": "https://jzsc.mohurd.gov.cn/data/company","Sec-Fetch-Dest": "empty","Sec-Fetch-Mode": "cors","Sec-Fetch-Site": "same-origin","User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/140.0.0.0 Safari/537.36","accessToken;": "","sec-ch-ua": "\"Chromium\";v=\"140\", \"Not=A?Brand\";v=\"24\", \"Google Chrome\";v=\"140\"","sec-ch-ua-mobile": "?0","sec-ch-ua-platform": "\"Windows\"","timeout": "30000","v": "231012"}self.cookies = {"Hm_lvt_b1b4b9ea61b6f1627192160766a9c55c": "1758382618,1758992952","Hm_lpvt_b1b4b9ea61b6f1627192160766a9c55c": "1758992952","HMACCOUNT": "C844EA1E4B3823E0"}self.url = "https://jzsc.mohurd.gov.cn/APi/webApi/dataservice/query/comp/list"def get_info(self, page):params = {"pg": str(page),"pgsz": "15","total": "450"}response = requests.get(self.url, headers=self.headers, cookies=self.cookies, params=params)return response.textdef parse_data(self, data1):data = {'group': 'jzsc','action': 'get_data','data': data1}res = requests.post('http://127.0.0.1:5620/business-demo/invoke', data=data)for i in json.loads(res.json()['data'])['data']['list']:code = i['QY_ORG_CODE']Legal_Person = i['QY_FR_NAME'] # 法人company_name = i['QY_NAME'] # 公司address = i['QY_REGION_NAME'] # 企业注册地址self.save(code, Legal_Person, company_name, address)"""创建数据表"""def create_table(self):sql = """create table if not exists jzsc1(id int primary key auto_increment,code varchar(100) ,Legal_Person varchar(50),company_name varchar(100),address varchar(150))"""try:self.cursor.execute(sql)print('创建成功')except Exception as e:print('表创建成功', e)"""插入数据"""def save(self, code, Legal_Person, company_name, address):sql = """insert into jzsc1(code,Legal_Person,company_name,address) values(%s,%s,%s,%s)"""try:self.cursor.execute(sql, (code, Legal_Person, company_name, address))self.db.commit()print('插入成功', code)except Exception as e:print(f'插入失败{e}')self.db.rollback()def main(self):self.create_table()for page in range(1, 15):print(f'正在爬取第{page}页')data = self.get_info(page)self.parse_data(data)if __name__ == '__main__':jz = Jzsc()jz.main()