当前位置：首页 > wzjs >正文

哪个网站可以查建筑公司资质百度云官网

wzjs 2025/8/23 1:37:05

哪个网站可以查建筑公司资质,百度云官网,石家庄电商网站建设,做网站jw100urllib模块官网：python自带的，无需安装。 urllib.request 打开和读取 URL urllib.error 包含 urllib.request 抛出的异常 urllib.parse 用于解析 URL urllib.robotparser 用于解析 robots.txt 文件 urllib.request 官网概念 urllib.request 模块…

urllib模块

官网：python自带的，无需安装。

urllib.request 打开和读取 URL
urllib.error 包含 urllib.request 抛出的异常
urllib.parse 用于解析 URL
urllib.robotparser 用于解析 robots.txt 文件

`urllib.request`

官网概念

urllib.request 模块定义了适用于在各种复杂情况下打开 URL（主要为 HTTP）的函数和类 --- 例如基本认证、摘要认证、重定向、cookies 及其它。

使用

urlopen

response = urllib.request.urlopen(url, data=None, [timeout, ]*, context=None)

response的属性和方法

import urllib# help(urllib.request.urlopen) urlopen(url,data=None,timeout=<object object at 0x000001A2C7D428B0>, *,context=None)response = urllib.request.urlopen('http://www.baidu.com')if response.getcode() == 200:# response属性 .reason	str	HTTP 响应的状态描述（如OK、Not Found等）。print(f'.reason属性，状态描述:{response.reason}')# response属性 .status	int	HTTP 响应的状态码（如200表示成功，404表示未找到）。print(f'.status属性，响应的状态码:{response.status}')# .url	str	实际请求的 URL（可能会与原始 URL 不同，例如发生重定向时）。print(f'.url属性，实际请求的 URL:{response.url}')# .headers	http.client.HTTPMessage	包含所有响应头信息的对象，可以通过字典方式访问头信息。# print(f'包含所有响应头信息的对象,获取请求头:{response.headers}')print(f'.headers属性，包含所有响应头信息的对象,获取请求头:{response.headers.get('Content-Type')}')# .msg	http.client.HTTPMessage	与.headers相同，包含响应头信息。print(f'msg属性，包含响应头信息:{response.msg}')# .version	int	HTTP 版本（如11表示 HTTP/1.1）。print(f'.version属性，HTTP版本:{response.version}')# .closed	bool	表示响应对象是否已关闭。print(f'.closed属性，表示响应对象是否已关闭:{response.closed}')# .read(size=None)	bytes	读取响应内容的全部或指定大小的字节。 read(5)读五个字节content = response.read().decode('utf-8')# print(f'.read()方法，读取响应内容的全部或指定大小的字节:{content}')# .readline()	bytes	读取一行内容（以\n分隔）。readline = response.readline().decode('utf-8')# print(f'.readline()方法，读取一行内容:{readline}')# .readlines()	List[bytes]	读取所有行，返回一个包含每行内容的列表。lines = response.readlines()# for line in lines:#     print('.readlines()方法，方法读取所有行，返回一个包含每行内容的列表: {}'.format(line.decode('utf-8')))# .getheader(name)	str或None	获取指定名称的 HTTP 响应头。如果头信息不存在，则返回None。header = response.getheader('Content-Type')print('.getheader(name)方法，获取指定名称的 HTTP 响应头。如果头信息不存在，则返回None: {}'.format(header))# .getheaders()	List[Tuple[str, str]]	返回所有 HTTP 响应头，格式为一个包含(header_name, header_value)元组的列表。header = response.getheader('Content-Type')print('.getheaders()方法，返回所有 HTTP 响应头，格式为一个包含(header_name, header_value)元组的列表: %s' %header )# .close()	None	关闭响应对象，释放资源。close = response.close()print('.close()方法，关闭响应对象，释放资源: %s' %close )
else:print("获取失败")

类型	名称	返回值类型	描述
属性	`.status`	`int`	HTTP 响应的状态码（如 `200` 表示成功，`404` 表示未找到）。
属性	`.reason`	`str`	HTTP 响应的状态描述（如 `OK`、`Not Found` 等）。
属性	`.url`	`str`	实际请求的 URL（可能会与原始 URL 不同，例如发生重定向时）。
属性	`.headers`	`http.client.HTTPMessage`	包含所有响应头信息的对象，可以通过字典方式访问头信息。
属性	`.msg`	`http.client.HTTPMessage`	与 `.headers` 相同，包含响应头信息。
属性	`.version`	`int`	HTTP 版本（如 `11` 表示 HTTP/1.1）。
属性	`.closed`	`bool`	表示响应对象是否已关闭。
方法	`.read(size=None)`	`bytes`	读取响应内容的全部或指定大小的字节。
方法	`.readline()`	`bytes`	读取一行内容（以 `\n` 分隔）。
方法	`.readlines()`	`List[bytes]`	读取所有行，返回一个包含每行内容的列表。
方法	`.getheader(name)`	`str` 或 `None`	获取指定名称的 HTTP 响应头。如果头信息不存在，则返回 `None`。
方法	`.getheaders()`	`List[Tuple[str, str]]`	返回所有 HTTP 响应头，格式为一个包含 `(header_name, header_value)` 元组的列表。
方法	`.close()`	`None`	关闭响应对象，释放资源。

urlretrieve

urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)

下载网页、图片、视频、音频...未来可能会停用需要小心使用。

# 下载功能但是官网说了 后续会停用 这个要看下
# 官网原话：以下函数和类是由 Python 2 模块 urllib （相对早于 urllib2 ）移植过来的。将来某个时候可能会停用。
from urllib.request import urlretrieve
# urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)
# 下载网页
# urlretrieve('http://www.baidu.com', filename='baidu.html')# 下载图片 百度找图片右键复制图片地址 图片一般.jpg
url_image = 'https://img2.baidu.com/it/u=3097253230,865203483&fm=253&fmt=auto&app=138&f=JPEG?w=800&h=1449'
# urlretrieve(url_image,'beautiful_gril.jpg')# 下载视频 获取视频地址时 F12右键查看src地址 视频一般用.mp4
url_video = 'https://vdept3.bdstatic.com/mda-qfvaqpnzzj3qf31w/sc/cae_h264/1719738331325131837/mda-qfvaqpnzzj3qf31w.mp4?v_from_s=hkapp-haokan-hbf&auth_key=1742536084-0-0-b0bc588642a1e5555556be80fc4fdcab&bcevod_channel=searchbox_feed&pd=1&cr=2&cd=0&pt=3&logid=2884307607&vid=16648779981624158842&klogid=2884307607&abtest=132219_1'
urlretrieve(url_video,'video.mp4')

Request

主要作用是为了自定义请求头。

from urllib.request import Request,urlopen# help(Request)
url = 'http://www.baidu.com'
# 主要是为了自定义请求头啥的
headers = {'User-Agent': 'Mozilla/5.0'}
req = Request(url, headers=headers)
response = urlopen(req)
print(response.read().decode('utf-8'))

`build_opener`

创建一个自定义的 OpenerDirector 对象，用于处理请求，可以扩展 urllib.request 的功能，比如支持代理、Cookies、认证等。

build_opener（处理器）

from urllib.request import build_opener,HTTPHandler# 创建自定义 Opener
opener = build_opener(HTTPHandler())
response = opener.open('http://www.baidu.com')
print(response.read().decode('utf-8'))

build_opener中的处理器可选值

处理器名	描述
`HTTPHandler`	处理 HTTP 请求。
`HTTPSHandler`	处理 HTTPS 请求。
`ProxyHandler`	处理代理请求。
`HTTPCookieProcessor`	处理 HTTP Cookies。
`HTTPBasicAuthHandler`	处理 HTTP 基本认证。
`HTTPDigestAuthHandler`	处理 HTTP 摘要认证。
`FTPHandler`	处理 FTP 请求。
`FileHandler`	处理本地文件请求。
`DataHandler`	处理数据 URL 请求（如 `data:` 开头的 URL）。

常用对比

方法名	描述
*`urllib.request.urlopen(url, data=None, timeout=socket._GLOBAL_DEFAULT_TIMEOUT, , cafile=None, capath=None, context=None)`**	发送 HTTP 请求并返回一个响应对象（如 `http.client.HTTPResponse`）。支持 GET、POST 等请求。
`urllib.request.Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None)`	创建一个请求对象，用于自定义请求的 URL、数据、头信息和方法（如 GET、POST）。
*`urllib.request.build_opener(handlers)`**	创建一个自定义的 OpenerDirector 对象，用于处理请求。
`urllib.request.install_opener(opener)`	安装一个全局的 OpenerDirector 对象，后续的 `urlopen` 会使用该对象。
`urllib.request.pathname2url(path)`	将本地文件路径转换为 URL 格式。
`urllib.request.url2pathname(url)`	将 URL 格式的路径转换为本地文件路径。
`urllib.request.getproxies()`	获取系统配置的代理信息。
`urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None)`	下载 URL 对应的资源并保存到本地文件。

`urllib.`errors

官网概念

urllib.error 模块为 urllib.request 所引发的异常定义了异常类。(就是提供了能够捕获request的异常的几个类)。

使用

from urllib.request import urlopen
from urllib.error import URLError,HTTPErrorurl = 'http://www.test.com'
# 测试 URLError
try:response = urlopen(url)
except HTTPError as e:print(f"HTTP 错误: {e.code} - {e.reason}")print("Headers:", e.headers)
except URLError as e:print(f"URL 错误: {e.reason}")

`urllib.`parse

官网概念

该模块定义了一个标准接口，用于将统一资源定位符（URL）字符串拆分为不同部分（协议、网络位置、路径等），或将各个部分组合回 URL 字符串，并将“相对 URL”转换为基于给定的“基准 URL”的绝对 URL。该模块被设计为匹配针对相对统一资源定位符的互联网 RFC。它支持下列 URL 类别: file, ftp, gopher, hdl, http, https, imap, itms-services, mailto, mms, news, nntp, prospero, rsync, rtsp, rtsps, rtspu, sftp, shttp, sip, sips, snews, svn, svn+ssh, telnet, wais, ws, wss。

为什么要使用编解码，通过下面的例子比如百度一下

# 页面通过百度一下搜索周杰伦 然后看链接就能获取到下面这个 根据这两个方法 有些需要我们转义类似的概念
encoded = "https://www.baidu.com/s?wd=周杰伦"
print(f"quote()方法输出：{quote(encoded.split("=")[1])}")
encoded = "https://www.baidu.com/s?wd=%E5%91%A8%E6%9D%B0%E4%BC%A6"
print(f"unquote()方法输出：{unquote(encoded.split("=")[1])}")

使用

`quote`

将字符串中的特殊字符转换为 URL 编码格式（也称为百分号编码）。

参数

string：需要编码的字符串。
safe：指定哪些字符不需要编码，默认为 '/'。
encoding：字符串的编码方式，默认为 None（使用默认编码）。
errors：编码错误处理方式，默认为 None。

unquote

将 URL 编码的字符串解码为原始字符串。

参数

string：需要解码的 URL 编码字符串。
encoding：解码后的字符串编码方式，默认为 'utf-8'。
errors：解码错误处理方式，默认为 'replace'。

quote_plus

将字符串中的特殊字符转换为 URL 编码格式，并将空格替换为 +。

参数

string：需要编码的字符串。
safe：指定哪些字符不需要编码，默认为空字符串 ''。
encoding：字符串的编码方式，默认为 None（使用默认编码）。
errors：编码错误处理方式，默认为 None。

unquote_plus

将 URL 编码的字符串解码为原始字符串，并将 + 替换为空格。

参数

string：需要解码的 URL 编码字符串。
encoding：解码后的字符串编码方式，默认为 'utf-8'。
errors：解码错误处理方式，默认为 'replace'。

from urllib.parse import quote,unquote,quote_plus,unquote_plus# quote()	将字符串中的特殊字符转换为 URL 编码格式。	空格编码为%20	对 URL路径部分进行编码。
# 对字符串进行 URL 编码
encoded = quote('Hello World!')
print(f"quote()方法输出：{encoded}")  # 输出: Hello%20World%21
# 指定不需要编码的字符
encoded = quote('Hello World!', safe='!')
print(f"quote()方法safe参数输出：{encoded}")  # 输出: Hello%20World%21
# unquote()	将 URL 编码的字符串解码为原始字符串。	%20解码为空格	对 URL 路径部分进行解码。
encoded = unquote(encoded)
print(f"unquote()方法输出：{encoded}")  # 输出: Hello World!
# quote_plus()	将字符串中的特殊字符转换为 URL 编码格式，并将空格替换为+。	空格替换为+	对 URL 查询参数部分进行编码。
encoded = quote_plus('Hello World!')
print(f"quote_plus()方法输出：{encoded}")  # 输出: Hello+World%21
# unquote_plus()	将 URL 编码的字符串解码为原始字符串，并将+替换为空格。	+替换为空格	对 URL 查询参数部分进行解码。
encoded = unquote_plus(encoded)
print(f"quote_plus()方法输出：{encoded}")  # 输出: Hello World!

编码和解码方法对比

函数名	功能描述	空格处理	典型用途
`quote()`	将字符串中的特殊字符转换为 URL 编码格式。	空格编码为 `%20`	对 URL 路径部分进行编码。
`unquote()`	将 URL 编码的字符串解码为原始字符串。	`%20` 解码为空格	对 URL 路径部分进行解码。
`quote_plus()`	将字符串中的特殊字符转换为 URL 编码格式，并将空格替换为 `+`。	空格替换为 `+`	对 URL 查询参数部分进行编码。
`unquote_plus()`	将 URL 编码的字符串解码为原始字符串，并将 `+` 替换为空格。	`+` 替换为空格	对 URL 查询参数部分进行解码。

urllib.robotparser

官网概念

此模块提供了一个单独的类 RobotFileParser，它可以回答关于某个特定用户代理能否在发布了 robots.txt 文件的网站抓取特定 URL 的问题。

使用

from urllib.robotparser import RobotFileParser# 创建 RobotFileParser 对象
rp = RobotFileParser()# 设置 robots.txt 文件的 URL
rp.set_url('https://www.baidu.com/robots.txt')# 读取并解析 robots.txt 文件
rp.read()# 检查访问权限
user_agent = 'MyBot'
urls_to_check = ['https://www.baidu.com/','https://www.baidu.com/private/','https://www.baidu.com/public/'
]for url in urls_to_check:if rp.can_fetch(user_agent, url):print(f"Allowed: {url}")else:print(f"Disallowed: {url}")

常用方法

方法名	描述
`set_url(url)`	设置 `robots.txt` 文件的 URL。
`read()`	读取并解析 `robots.txt` 文件。
`parse(lines)`	解析 `robots.txt` 文件的内容（以字符串列表形式传入）。
`can_fetch(useragent, url)`	检查指定的用户代理（爬虫）是否可以访问指定的 URL。
`mtime()`	返回上次获取 `robots.txt` 文件的时间（Unix 时间戳）。
`modified()`	设置上次获取 `robots.txt` 文件的时间（Unix 时间戳）。

get请求

获取请求头和参数

import urllib.request,urllib.parseurl = ('https://www.baidu.com/s?wd=')headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36'
}wd = '林俊杰'# get 请求参数 quote 是对字符串编码
name = urllib.parse.quote(wd)# get 请求参数
request = urllib.request.Request(url=url+name,headers=headers)response = urllib.request.urlopen(request)# 获取相应数据
print(response.code,response.read().decode('utf-8'))

Post请求

获取post请求头和参数：


import urllib.request,urllib.parseurl = 'https://fanyi.baidu.com/sug'headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/134.0.0.0 Safari/537.36'
}data = {"kw":'spider'
}# post 请求参数 必须进行编码 encode("utf-8")
data = urllib.parse.urlencode(data).encode("utf-8")# post 请求参数必须编码
request = urllib.request.Request(url=url,data=data,headers=headers)response = urllib.request.urlopen(request)# 获取相应数据
print(response.code,response.read().decode('utf-8'))

代码路径：pythonPractice: python学习内容练习-代码

查看全文

http://www.dtcms.com/wzjs/448999.html