爬虫基础学习-POST方式、自定义User-Agent
POST方法:
添加data(可选),请求的方法就不再是get,而成了post#加上encode(“utf-8”)-> str-》bytes
#解码 decode("utf-8") byte-》str
#!/usr/bin/env python3import urllib.request
import urllib.parse
import urllib.error# 定义URL
url = 'http://httpbin.org/post'# 创建POST表单发送的内容
data = {'hello': 'world','handsome_man': 'wangbo'
}# data需要进行编码
data_encode = urllib.parse.urlencode(data).encode('utf-8')# response = urllib.request.urlopen(url=url, data=data_encode)
# print(response.read().decode('utf-8'))# 设置连接超时
try:response = urllib.request.urlopen(url=url, data=data_encode, timeout=0.1)print(response.read().decode('utf-8'))
except urllib.error.URLError as e:print("连接超时")
实际使用爬虫需要自定义HTTP请求头
urllib发送,get,post
自己构造请求timeout:
用于设置超时(秒)
爬虫的核心就是模拟用户!
利用 request 模块来自行构造HTTP请求头,首先构造User-Agent请求头。
#!/usr/bin/env python3import urllib.request
import urllib.parse
import urllib.error# 定义URL
url = 'http://httpbin.org/post'# 自定义Request, 添加一个User-agent
header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36"
}req = urllib.request.Request(url=url, headers=header, method="POST")
response = urllib.request.urlopen(req)
print(response.read().decode('utf-8'))