爬虫基础学习-项目实践:每次请求,跟换不同的user-agent
首先,收集多组user-agent
网址:常见User-Agent - 朴文 - 博客园
然后,每次请求的时候随机选择一个User-Agent
方法:利用random随机模块 choice 函数
使用User-Agent
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:120.0) Gecko/20100101 Firefox/120.0",
"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0",
"Mozilla/5.0 (iPhone; CPU iPhone OS 17_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Mobile/15E148 Safari/604.1"
编写程序:
#!/usr/bin/env python3
import random
import urllib.request
import urllib.parse
import urllib.error# 定义URL
url = 'http://httpbin.org/post'# 定义多组User-Agent
user_agent_list = ["Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36","Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36","Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:120.0) Gecko/20100101 Firefox/120.0","Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.0.0 Safari/537.36 Edg/120.0.0.0","Mozilla/5.0 (iPhone; CPU iPhone OS 17_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.1 Mobile/15E148 Safari/604.1"
]# 利用 random 函数 每次随机抽取一个User-Agent
rando_user_agent = random.choice(user_agent_list)
header = {"User-Agent": rando_user_agent
}
req = urllib.request.Request(url=url, headers=header,method="POST")
# req.add_header("User-Agent", rando_user_agent) HTTP Error 405: METHOD NOT ALLOWED
response = urllib.request.urlopen(req)
print(response.read().decode('utf-8'))