当前位置：首页 > news >正文

urllib的使用

news 2025/9/12 14:54:59

urllib的使用

pyt内置库，利用它可以实现HTTP请求的发送。四大模块构成：

request：最基本的HTTP请求模块。
error：异常处理模块
parse：工具模块
robotparser：用于识别网站的robots.txt文件。

发送请求

使用urllib.request中的urlopen()方法可以向服务器发送一个基本的请求，该函数返回一个HTTPResponse对象。

urllib.request.urlopen(url, data=None, [timeout, ]*, cafile=None, capath=None, cadefault=False, context=None)

基本使用

使用示例：

import urllib.requestresponse=urllib.request.urlopen('https://www.baidu.com')
print(response.status)
print(response.getheaders())
print(response.getheader('Server'))

data参数

我们还可以传递data参数。在添加该参数时需要使用bytes方法将参数转换为字节流编码格式（bytes类型）。如果传递了该参数，这个请求就是POST而不是GET了。

import urllib.request
import urllib.parse# urlencode将二元组转换为符合url规范的查询字符串
# bytes函数将第一个参数从字符串转化为字节序列
data = bytes(urllib.parse.urlencode({'name' : 'germey'}),encoding='utf-8')response = urllib.request.urlopen('https://www.httpbin.org/post',data=data)
print(response.read().decode('utf-8'))

timeout参数

设置超时，通常额异常处理机制一起使用：

import urllib.request
import urllib.error
import sockettry:response = urllib.request.urlopen('https://www.httpbin.org/get',timeout=0.1)
except urllib.error.URLError as e: # 捕获该异常if isinstance(e.reason,socket.timeout): # 如果该异常的类型是socket.timeoutprint('TIME OUT')

Request类

urlopen可以发送最基本的请求，使用Requeset类可以构建更加复杂的请求。

import urllib.requestrequset = urllib.request.Request('https://cn.bing.com')
response = urllib.request.urlopen(requset)
print(response.read().decode('utf-8'))

我们可以通过构造Request对象来构造更复杂的请求：

from urllib import request, parseurl = 'https://www.httpbin.org/post'
headers = { # 请求头'User-Agent': 'Mozilla/4.0 (compatible; MSIE 5.5; Windows NT)','Host': 'www.httpbin.org'
}
dict = {'name':'germey'}
data = bytes(parse.urlencode(dict),encoding='utf-8')
req = request.Request(url=url,data=data,headers=headers,method='POST')
response = request.urlopen(req)
print(response.read().decode('utf-8'))

高级用法

使用Handler可以支持更高级的操作，比如cookie处理、代理设置等。Handler就是各种处理器，用于处理不同的操作，这使得我们几乎可以实现HTTP请求中的所有功能。Handler类中的最基本父类是BaseHandler。

OpenerDirector较之Request类和urlopen方法可以提供更加深层次的配置。Opener提供的open方法返回的响应类型和urlopen如出一辙。

我们可以利用Handler来构建Opener。比如处理网页的验证：

from urllib.request import HTTPPasswordMgrWithDefaultRealm, HTTPBasicAuthHandler,build_opener
from urllib.error import URLErrorusername = 'admin'
password = 'admin'
url = 'https://ssr3.scrape.center/'p = HTTPPasswordMgrWithDefaultRealm() # 该类用于管理和存储http请求中的用户名和密码
p.add_password(None,url,username,password) # 使用默认域
auth_handler = HTTPBasicAuthHandler(p) # 该处理器用于处理http基本认证
opener = build_opener(auth_handler)try:result = opener.open(url)html = result.read().decode('utf-8')print(html)
except URLError as e:print(e.reason)