(我与爬虫的较量)码上爬第5题
题目地址:https://www.mashangpa.com/problem-detail/5/
这里采用的是post请求,参数xl肯定是经过混淆的了。
xl: "24b0d0d3a84242d3d56a0db40deb64d03485dc6576f682d64eb1b45af53c24f3"
我们去搜索了一下发现在这里出现了,那么我们就一层一层的往回找就可以了:
发现上面的这一部分被encrypy()处理过了,再往上找发现了这个函数:
找到这个函数后发现还缺少两个参数:lv和key,继续往上找
最终发现这个lv和key也不过是常量,只是长的样子不是很友好,那我们不妨简单粗暴的写一个钩子,在控制台直接复制下面的代码:
(function () {if (!window.CryptoJS || !CryptoJS.AES) { console.warn('No CryptoJS.AES'); return; }const orig = CryptoJS.AES.encrypt;CryptoJS.AES.encrypt = function (msg, key, cfg) {const toStr = x => { try { return x && x.toString ? x.toString() : String(x); } catch{ return String(x); } };console.log('[AES.encrypt]','\n key(hex)=', toStr(key),'\n key(utf8)=', (CryptoJS.enc.Utf8.stringify ? CryptoJS.enc.Utf8.stringify(key) : '(no Utf8.stringify)'),'\n iv (hex)=', cfg && cfg.iv ? toStr(cfg.iv) : null,'\n iv (utf8)=', cfg && cfg.iv && CryptoJS.enc.Utf8.stringify ? CryptoJS.enc.Utf8.stringify(cfg.iv) : null);return orig.apply(this, arguments);};console.log('✅ hook ok');
})();
然后翻页在请求一次就可以看到这两个具体的值了
然后我们直接python复现源代码获得xl参数:
import time, json, requests
from Crypto.Cipher import AES
from Crypto.Util.Padding import padKEY_HEX = "6a6f386a39774777253648627866466e" # 16字节 => AES-128
IV_HEX = "30313233343536373839414243444546" # "0123456789ABCDEF"
headers = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36 Edg/139.0.0.0","Referer": "https://www.mashangpa.com/problem-detail/1/","Accept": "*/*","Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6","Accept-Encoding": "gzip, deflate, br, zstd","Cookie": "yous Cookies",
}
KEY = bytes.fromhex(KEY_HEX)
IV = bytes.fromhex(IV_HEX)def encrypt_to_hex(plain_str: str) -> str:"""AES-CBC + PKCS7,加密后输出hex小写"""cipher = AES.new(KEY, AES.MODE_CBC, IV)ct = cipher.encrypt(pad(plain_str.encode('utf-8'), AES.block_size))return ct.hex()def build_xl(page: int, ts_ms: int | None = None):"""构造与前端一致的xl与时间戳"""ts = ts_ms if ts_ms is not None else int(time.time() * 1000)params = {"page": page, "_ts": ts}# 与 JSON.stringify 一致的紧凑格式plain = json.dumps(params, separators=(',', ':'))xl = encrypt_to_hex(plain)return xl, tsdef post_data(problem_id: int, page: int):xl, ts = build_xl(page)url = f"https://www.mashangpa.com/api/problem-detail/{problem_id}/data/"h = {"Content-Type": "application/json", "User-Agent": "Mozilla/5.0"}if headers: h.update(headers)r = requests.post(url, json={"xl": xl}, headers=headers, timeout=15)r.raise_for_status()return r.json()# ---- 可选:用你抓到的样例做一次校验(应得到相同的cipher hex)
if __name__ == "__main__":timestamp = int(time.time() * 1000)sample_plain = '{"page":2,"_ts":{timestamp}}'sample_cipher_hex = encrypt_to_hex(sample_plain)# print("cipher(hex) =", sample_cipher_hex)# 期待:24b0d0d3a84242d3d56a0db40deb64d057e1befbb07dec59069bd1e070c95ce9sum_all = 0for i in range(1, 21):data = post_data(5,i)['current_array']total = sum(data)sum_all+=totalprint(sum_all)
若提示有的packge未安装,就pip install requests pycryptodome