当前位置：首页 > news >正文

【语法进阶】匹配分组

news 2025/9/21 15:28:42

1. |：匹配左右任意一个表达式 --常用

import re
res = re.match("abc|def", "abcdefg")
print(res.group())res = re.match("\s|\d", " 1abc!")    \s空白    \S非空白
print(res.group())

2. (ab)：将括号中字符作为一个分组 --常用

# 需求：匹配邮箱
# res = re.match("\w*@qq\.com", "123@qq.com")
res = re.match("\w*@qq\.com|\w*@126\.com|\w*@163\.com", "123@qq.com")
# 简化写法：
res = re.match("\w*@(qq|126|163)\.com", "123@qq.com")
print(res.group())

3. \num：引用分组num匹配到的字符串，经常在标签中被使用

import re
res = re.match('<\w*>\w*</\w*>', '<html>login</html>')
print(res.group())       # <html>login</html>
# res = re.match('<(\w*)>\w*</\1>', '<html>login</html>')
# print(res.group())     # 报错，反斜杠的问题，加r原生字符串取消转义
res = re.match(r'<(\w*)>\w*</\1>', '<html>login</html>')
print(res.group())       # <html>login</html>
# res = re.match('<(\w*)>\w*</\\1>','<html>login</html>')
# print(res.group())     # <html>login</html>
# 注意: 前面的<>里面是什么，后面的<>就是是什么
# res = re.match('<(\w*)>\w*</\\1>', '<html>login</htmler>')
# print(res.group())     # 报错res = re.match('<\w*><\w*>.*</\w*></\w*>', '<html><body>登录页面</body></html>')
print(res.group())
res = re.match(r'<(\w*)><(\w*)>.*</\2></\1>', '<html><body>登录页面</body></html>')
# 注意顺序，从外到内(从前到后)排序，编号从一开始，外面是1组，里面是2组
print(res.group())

4. (?P<name>)：分组起别名 --扩展
5. (?p=name)：引用别名为name分组匹配到的字符串 --扩展

import re
res = re.match(r'<(?P<l1>\w*)><(?P<l2>\w*)>.*</(?P=l2)></(?P=l1)>', '<html><body>登录页面</body></html>')
print(res.group())      # <html><body>登录页面</body></html># 需求：匹配网址
# 要求：前缀:www，后缀:.com/.cn/.net/.org
import re
li = ['www.baidu.com', 'www.taobao.com', 'www.jd.con', 'www.abc.n', "www.python.org"]
res = re.match(r'www(\.)\w*\1(com|cn|com|n)', 'www.baidu.com')
print(res.group())
for i in li:res = re.match("www\.\w*\.(com|cn|net|org)", i)  #斜杠把点变成现实文字的点# 判断if res:print(res.group())else:print(f'{i}网址有误哦')#print(i)

查看全文

http://www.dtcms.com/a/393327.html