猿人学web端爬虫攻防大赛赛题第7题——动态字体,随风漂移
解题步骤
-
看流量包。
-
在看数据包的session中没有任何加密字段,请求头中也没有加密的参数。
-
响应数据中的如
됨
形式的数据与页面中的数字是一一对应的,直接搞一个字典,获取每个页面的数据然后对照一下就完事了。可惜没那么简单,再访问同一个页面,你会发现对照关系变了。 -
所以上面的想法是有问题的,再看响应数据中的
woff
字段值也跟着变了,说明跟woff
字段可能有关系。打断点,触发一下。woff 文件是字体文件,实际上就是编码和字符的映射表,如 섴,&#x 是字符前缀,c134 是字符对应的编码
ttf = data.woff; $('.font').text('').append('<style type="text/css">@font-face { font-family:"fonteditor";src: url(data:font/truetype;charset=utf-8;base64,' + ttf + '); }</style>');
src
中是woff文件的下载地址,这里可以看到 woff 文件被保存为了 ttf 格式,通过 python 将其下载下来:from fontTools.ttLib import TTFont # pip install fontTools from base64 import b64decode from parsel import Selector # pip install parseldef demo(data):"""data为接口返回的内容"""with open('7.ttf', mode='wb') as file:file.write(b64decode(data['woff'])) # 将 woff 字段 b64解码后写入到文件font = TTFont('7.ttf') # 加载字体文件font.saveXML('7.xml') # 保存为xml文件# 读取 xml 文件with open('7.xml', mode='r', encoding='utf-8') as f:xml_data = f.read()select = Selector(xml_data)glyf = select.css('glyf > TTGlyph') # 获取 glyf 下所有的 TTglyph 标签for TTGlyph in glyf[1:]: # 第 0 个标签的值是不需要的,所以从 第 1 个元素开始遍历name = TTGlyph.css('::attr(name)').get().replace('uni', '') # 获取 TTGlyph 标签里对应的 name 属性,并将 uni 替换为空pt_tag = TTGlyph.css('pt') # 获取 TTGlyph 下所有的 pt 标签on_list = []for pt in pt_tag: # 遍历 pt 标签on = pt.css('::attr(on)').get() # 获取 pt 标签里对应的 on 属性on_list.append(on) # 将解析的到 on 属性值添加到列表中print(f"'{''.join(on_list)}': '{name}',") # 打印出字典形式的字符串# ''.join(on_list) 对应字典键# name 对应字典值 resp = {"woff": "AAEAAAAKAIAAAwAgT1MvMv/BOMUAAAEoAAAAYGNtYXDlXV9jAAABpAAAAYZnbHlmS99AtgAAA0QAAAQCaGVhZB6SqjgAAACsAAAANmhoZWEG0QEyAAAA5AAAACRobXR4ArwAAAAAAYgAAAAabG9jYQWOBpkAAAMsAAAAGG1heHABGABFAAABCAAAACBuYW1lUGhGMAAAB0gAAAJzcG9zdCjmdk0AAAm8AAAAiAABAAAAAQAA83P19l8PPPUACQPoAAAAANnIUd8AAAAA418UhQAH/+wCRwMDAAAACAACAAAAAAAAAAEAAAQk/qwAfgJYAAAAKwItAAEAAAAAAAAAAAAAAAAAAAACAAEAAAALADkAAwAAAAAAAgAAAAoACgAAAP8AAAAAAAAABAIqAZAABQAIAtED0wAAAMQC0QPTAAACoABEAWkAAAIABQMAAAAAAAAAAAAAEAAAAAAAAAAAAAAAUGZFZABApDXINwQk/qwAfgQkAVQAAAABAAAAAAAAAAAAAAAgAAAAZAAAAlgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwAAAAMAAAAcAAEAAAAAAIAAAwABAAAAHAAEAGQAAAAUABAAAwAEpDWmhaeTtoTCV8KGxCHFkcg3//8AAKQ1poWnk7aEwlfChcQhxZHIN///W89ZhVhwSYQ9sgAAO+A6dTfQAAEAAAAAAAAAAAAAAAoAAAAAAAAAAAACAAUAAAEGAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAALADoAcgDIAQkBHwFiAX8BsAHuAgEAAQAH/+wAIwAHAAIAADczFQccBxsAAAIAI//yAhcDAwAMABkAAAEmBwYQFxYyNzYQJyYHMhcWEAcGIicmEDc2ASR8REFBRP1rBwdrgWUfKSkfvCEfHyEC3iWiU/7EdWtrdQE8U6J4bDn+9DtdXTsBDDlsAAABACr/8gIhAt4AJAAAEwMzNjM2MxYWFRQGIyInJjcjFhcWMzI3NjU0JiMGBwYHIzchNWMVQhwpIi1ZY3RGSBs7BWMURVJCfloyfHIfNisbDCABWALe/lVOKBtIZFVcKRg/PVAyTjZjeIYDFQsu+10AAwAj//ICJQLeAB8ALAA4AAABIgcGFRQXFhc1JgcGFRQWMjc2NTQnJgcVNjc0NTQnJgcyFxYUBwYiJyY0NzYTMhcWFAcGIiY0NzYBJGk4NQEcRTgTQov4OEcnI0BDMidUcVMlU0kjqAZJKSxDXCU2NiifZjQyAt5AJWc4HjkJAwRJMUZhbjk1YUYxSQQDCTkeOGclQFcfD38bJiYbfw8f/tYjOoIkJkqCOiMAAAIAKv/yAhoDAwAbACgAAAEiBwYVFBcWFzY2NCYjIgciByM3NDc2FzYXMyYDFjcWFAcGJyInJjQ2AUFsUllBRZFbe0iNMyNAJQcNSBdVjiArKLtGUgInJ0xaIi9gAt6DS7N7m1QBAX/ich1MJHFSWg8Pd9z+mhsncZ8rPQkqfkd0AAABAHAAAAFoAt4ACQAAAQYGBxU2NxEzEQEjH2oqfChUAt5ALhhUHDv9pQLeAAEAKv/yAhcC3gArAAABIgcGFzM2NjcWFxYUBiMjFTMWFhQHBiMmJyYnIxYXFhcyNjU0JyYHNjU0JgE3XFRHC0kTPlREHkVdTENNPmERTEJfMikFWBRHP31ceiIVRXRqAt5ALm5CRwMDIw+MMEwCRX0xKxEXQEB8KUEBfGQ7OWMzJ4NBfQAAAgAaAAACRwLeAAoADgAAAQEVIRUzNTM1IxEHMxEhAYP+lwFpSnp6UAb+1gLe/hlilZVDAgaL/oUAAAEANAAAAjAC3gAdAAABIgYHMzY3NhcyFhUUBwYHBgcGFSE1ITY3Njc2NCYBQW2JC0cRJSBaRnJaHltzJk8B5v58P2hwJl2UAt6Wckw3NgRKIV4ZQz1VLjtrQFFfVmMJqYMAAgAq//ICFwLeABwAKAAAASIGFRQXFjMWNjczFxQHBiMiJyMWMzY3NjU0JyYHMhcWFAYjIiY1NDYBJGCaRTqYCG4YEQ9FM1NgOEEEzXlNNS5OhGYgUo9JQ0dPAt6Fc3I6Ogk7NAmDWEhstAF4Wb2hTm5PNjeUVIIqXksAAAEASwAAAg8C3gAGAAATFSEBMwE1SwGG/uBTAQsC3mX9hwKeQAAAAAAAABIA3gABAAAAAAAAABcAAAABAAAAAAABAAwAFwABAAAAAAACAAcAIwABAAAAAAADABQAKgABAAAAAAAEABQAKgABAAAAAAAFAAsAPgABAAAAAAAGABQAKgABAAAAAAAKACsASQABAAAAAAALABMAdAADAAEECQAAAC4AhwADAAEECQABABgAtQADAAEECQACAA4AzQADAAEECQADACgA2wADAAEECQAEACgA2wADAAEECQAFABYBAwADAAEECQAGACgA2wADAAEECQAKAFYBGQADAAEECQALACYBb0NyZWF0ZWQgYnkgZm9udC1jYXJyaWVyLlBpbmdGYW5nIFNDUmVndWxhci5QaW5nRmFuZy1TQy1SZWd1bGFyVmVyc2lvbiAxLjBHZW5lcmF0ZWQgYnkgc3ZnMnR0ZiBmcm9tIEZvbnRlbGxvIHByb2plY3QuaHR0cDovL2ZvbnRlbGxvLmNvbQBDAHIAZQBhAHQAZQBkACAAYgB5ACAAZgBvAG4AdAAtAGMAYQByAHIAaQBlAHIALgBQAGkAbgBnAEYAYQBuAGcAIABTAEMAUgBlAGcAdQBsAGEAcgAuAFAAaQBuAGcARgBhAG4AZwAtAFMAQwAtAFIAZQBnAHUAbABhAHIAVgBlAHIAcwBpAG8AbgAgADEALgAwAEcAZQBuAGUAcgBhAHQAZQBkACAAYgB5ACAAcwB2AGcAMgB0AHQAZgAgAGYAcgBvAG0AIABGAG8AbgB0AGUAbABsAG8AIABwAHIAbwBqAGUAYwB0AC4AaAB0AHQAcAA6AC8ALwBmAG8AbgB0AGUAbABsAG8ALgBjAG8AbQAAAgAAAAAAAAAOAAAAAAAAAAAAAAAAAAAAAAAAAAAACwALAAABCgEEAQkBCAELAQYBBQEDAQIBBwd1bmljMjU3B3VuaWI2ODQHdW5pYzI4NQd1bmljODM3B3VuaWM1OTEHdW5pYTY4NQd1bmlhNDM1B3VuaWE3OTMHdW5pYzQyMQd1bmljMjg2","status": "1","state": "success","data": [{"value": "양 뚄 양 ꐵ "},{"value": "슅 쐡 젷 슆 "},{"value": "양 쉗 슅 ꞓ "},{"value": "ꞓ 슅 슅 쐡 "},{"value": "ꚅ 쐡 양 ꚅ "},{"value": "ꞓ ꞓ 쉗 ꞓ "},{"value": "뚄 슆 쉗 쐡 "},{"value": "ꞓ 젷 쐡 쐡 "},{"value": "젷 슅 쐡 쐡 "},{"value": "ꚅ 젷 ꚅ ꞓ "}] } demo(resp)
运行得到映射结果。
-
解析得到映射字典。
on_map = {'1001101111': '1','101010101101010001010101101010101010010010010101001000010': '8','10101010100001010111010101101010010101000': '6','10100100100101010010010010': '0','1110101001001010110101010100101011111': '5','10010101001110101011010101010101000100100': '9','100110101001010101011110101000': '2','111111111111111': '4','1111111': '7','10101100101000111100010101011010100101010100': '3', }
-
有了映射字典就可以请求并解析到正确的数字了
from fontTools.ttLib import TTFont # pip install fontTools from base64 import b64decode from parsel import Selector import requestsdef save_font(font_data):on_map = {'1001101111': '1','101010101101010001010101101010101010010010010101001000010': '8','10101010100001010111010101101010010101000': '6','10100100100101010010010010': '0','1110101001001010110101010100101011111': '5','10010101001110101011010101010101000100100': '9','100110101001010101011110101000': '2','111111111111111': '4','1111111': '7','10101100101000111100010101011010100101010100': '3',}with open('7.ttf', mode='wb') as f:f.write(b64decode(font_data['woff'])) # 保存字体文件font = TTFont('7.ttf') # 加载字体文件font.saveXML('7.xml') # 保存为xml文件# 读取 xml 文件with open('7.xml', mode='r', encoding='utf-8') as f:xml_data = f.read()select = Selector(xml_data)# 获取 <glyf> --> 所有 TTGlyph 标签TTGlyph = select.css('glyf > TTGlyph')[1:] # 第 0 个标签的信息不需要,从第 1 个标签开始获取rep_dist = {}for tt in TTGlyph:name = tt.css('::attr(name)').get().replace('uni', '') # TTGlyph标签 --> name 值pt = tt.css('pt') # 获取 Glyph标签 --> TTGlyph标签 --> pt标签对应的 on 值on_list = []for pt_tag in pt:on_list.append(pt_tag.css('::attr(on)').get())rep_dist[name] = on_map[''.join(on_list)] # 根据映射将 on 值替换成正确的数字result_dict = []for data in font_data['data']:num_list = []for nums in data['value'].replace('&#x', '').split(' ')[0:-1]:num_list.append(rep_dist[nums])result_dict.append(int(''.join(num_list)))# print(rep_dist[nums], end='')# print()return result_dictheaders = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/127.0.0.0 Safari/537.36" } cookies = {"sessionid": "mhgiqaaxkqpt3ybutbub96ubi9rr5gtk" } url = "https://match.yuanrenxue.cn/api/match/7?page=1" resp = requests.get(url, headers=headers, cookies=cookies) print(save_font(resp.json()))
运行结果如下。
与第一页的数据一致,没问题。 -
接下来就是获取所有召唤师的名字了,在js代码中进行了处理。
对应的js代码。let page = 1; let name = ['极镀ギ紬荕', '爷灬霸气傀儡', '梦战苍穹', '傲世哥', 'мaη肆風聲', '一刀メ隔世', '横刀メ绝杀', 'Q不死你R死你', '魔帝殤邪', '封刀不再战', '倾城孤狼', '戎马江湖', '狂得像风', '影之哀伤', '謸氕づ独尊', '傲视狂杀', '追风之梦', '枭雄在世', '傲视之巅', '黑夜刺客', '占你心为王', '爷来取你狗命', '御风踏血', '凫矢暮城', '孤影メ残刀', '野区霸王', '噬血啸月', '风逝无迹', '帅的睡不着', '血色杀戮者', '冷视天下', '帅出新高度', '風狆瑬蒗', '灵魂禁锢', 'ヤ地狱篮枫ゞ', '溅血メ破天', '剑尊メ杀戮', '塞外う飛龍', '哥‘K纯帅', '逆風祈雨', '恣意踏江山', '望断、天涯路', '地獄惡灵', '疯狂メ孽杀', '寂月灭影', '骚年霸称帝王', '狂杀メ无赦', '死灵的哀伤', '撩妹界扛把子', '霸刀☆藐视天下', '潇洒又能打', '狂卩龙灬巅丷峰', '羁旅天涯.', '南宫沐风', '风恋绝尘', '剑下孤魂', '一蓑烟雨', '领域★倾战', '威龙丶断魂神狙', '辉煌战绩', '屎来运赚', '伱、Bu够档次', '九音引魂箫', '骨子里的傲气', '霸海断长空', '没枪也很狂', '死魂★之灵']; let heroArray = [] for (let i = 0; i <= 4; i++) {let yyq = 1;// ['', '', '', '', '', '', '', '', '', ''] 对应一页十条数据['', '', '', '', '', '', '', '', '', ''].forEach((index, val) => {// console.log(name[yyq + (page - 1) * 10]);heroArray.push(name[yyq + (page - 1) * 10])yyq += 1})page += 1; } console.log(heroArray)
运行结果。
与页面一致