当前位置：首页 > news >正文

正则表达式的使用

news 2025/9/4 7:56:45

正则表达式就是一连串的超级通配符，用来快速“搜查，替换，拆解，验证”文本。

一、字符含义

1）普通字符

符号	含义	例子
`abc`	完全匹配这三个字母	`"abc"` → ✅ `"xabc"` → ❌

import re
## 普通字符
print(re.findall(r'python', 'hello world! I like python!!'))      ## ['python']
print(re.findall(r'java', 'hello world! I like python!!'))        ## []

2）单字符简写

符号	含义	举例
`.`	任意一个字符（换行除外）	`a.c` → `"abc"` `"a1c"` ✅
`\d`	一位数字 `[0-9]`	`\d` → `"5"` ✅ `"a"` ❌
`\D`	非数字 `[^0-9]`	`\D` → `"a"` ✅ `"5"` ❌
`\w`	字母/数字/下划线 `[A-Za-z0-9_]`	`\w` → `"_"` `"9"` ✅
`\W`	非 \w	`\W` → `"@"` ✅
`\s`	空白字符（空格、Tab、换行）	`\s` → `" "` ✅
`\S`	非空白	`\S` → `"a"` ✅

import re
## 单字符简写
txt = "Room 101 is 30.5 m², cat cot ccc cbt"
print(re.findall(r'c.t',txt))         # ['cat', 'cot', 'cbt']
print(re.findall(r'\d+', txt))        # ['101', '30', '5']
print(re.findall(r'\D+', txt))        # ['Room ', ' is ', '.', ' m², cat cot ccc cbt']
print(re.findall(r'\s+', txt))        # [' ', ' ', ' ', ' ', ' ', ' ', ' ', ' ']
print(re.findall(r'\S+', txt))        # ['Room', '101', 'is', '30.5', 'm²,', 'cat', 'cot', 'ccc', 'cbt']
print(re.findall(r'\S', txt))         # ['R', 'o', 'o', 'm', '1', '0', '1', 'i', 's', '3', '0', '.', '5', 'm', '²', ',', 'c', 'a', 't', 'c', 'o', 't', 'c', 'c', 'c', 'c', 'b', 't']

\S+ 一次抓“一整串”连续非空白字符（单词、数字、符号整体）。
\S 只抓“单个”非空白字符（逐字母、逐数字拆开）。

3）数量词

正则引擎是从左到右，一个字符一个字符地滑动尝试匹配。

符号	含义	举例
`*`	0 次或多次	`ab*` → `"a"` `"ab"` `"abb"` ✅
`+`	1 次或多次	`ab+` → `"ab"` `"abb"` ✅ `"a"` ❌
`?`	0 或 1 次	`ab?` → `"a"` `"ab"` ✅ `"abb"` ❌
`{n}`	恰好 n 位数字	`\d{4}` → `"2024"` ✅
`{n,}`	至少 n 位数字	`\d{3,}` → `"123"` `"12345"` ✅
`{n,m}`	n 到 m 位数字	`\d{2,4}` → `"12"` `"123"` `"1234"` ✅

## 数量词
print(re.findall(r'ab*', 'a b c ab ba aab abb abbb'))   # ['a', 'ab', 'a', 'a', 'ab', 'abb', 'abbb']
print(re.findall(r'ab+', 'a b c ab ba aab abb abbb'))   # ['ab', 'ab', 'abb', 'abbb']
print(re.findall(r'ab?', 'a b c ab ba aab abb abbb'))   # ['a', 'ab', 'a', 'a', 'ab', 'ab', 'ab']
print(re.findall(r'\d{3}', '11 15 378 666 99 78910'))    # ['378', '666', '789'] 
print(re.findall(r'\d{3,}', '11 15 378 666 99 78910'))    # ['378', '666', '78910']
print(re.findall(r'\d{2,4}', '11 15 378 666 99 78910'))  # ['11', '15', '378', '666', '99', '7891']

4）位置锚点

符号	含义	举例
`^`	开头	`^\d` → `"1abc"` ✅ `"a1bc"` ❌
`$`	结尾	`\d$` → `"abc1"` ✅ `"1abc"` ❌
`\b`	单词边界	`\bcat\b` → `"cat"` ✅ `"catalog"` ❌
`\B`	非单词边界	`\Bcat` → `"scat"` ✅

## 位置锚点
print(bool(re.search(r'^\d+', '123abc')))   # True
print(bool(re.search(r'^\d+', 'abc123')))   # False
print(bool(re.search(r'\d+$', 'abc123')))   # True
print(re.findall(r'\bcat\w*\b', 'cat scat catch ctcat'))   # ['cat', 'catch']
print(re.findall(r'\b\w*cat\b', 'cat scat catch ctcat'))   # ['cat', 'scat', 'ctcat']
# 用 ^ 和 $ —— 只能匹配整段文本的首尾
print(re.findall(r'^cat\w*', 'cat scat catch ctcat'))   # ['cat']  
print(re.findall(r'\w*cat$', 'cat scat catch ctcat'))  # ['ctcat']

⚠️注意：为什么不建议使用^和$来筛选以“cat”开头和以“cat”结尾：
^和$是只对整段文本的开头和结尾，不认每个单词的头或尾。

5）中括号自定义集合

符号	含义	举例
`[abc]`	a 或 b 或 c	`[abc]` → `"a"` ✅ `"d"` ❌
`[a-z]`	a 到 z 任意小写字母	`[a-z]+` → `"hello"` ✅
`[^abc]`	非 a/b/c	`[^abc]` → `"d"` ✅ `"a"` ❌

print(re.findall(r'[aeiou]', 'hello world'))          # ['e', 'o', 'o']
print(re.findall(r'[^aeiou]', 'hello'))               # ['h', 'l', 'l']

6）贪婪 / 非贪婪

符号	含义	举例
`.*`	贪婪 → 尽可能多匹配	`<.*>` 在 `<a><b></b></a>` 会一次吃掉整串
`.*?`	非贪婪 → 尽可能少匹配	`<.*?>` 会分成 `<a>` `<b>` `</b>` `</a>`

7）宽度断言

符号	含义	举例
`(?=...)`	正向前瞻	`\d+(?=元)` → 匹配后面有“元”的数字
`(?!...)`	负向前瞻	`\d+(?!元)` → 匹配后面没有“元”的数字
`(?<=...)`	正向后顾	`(?<=￥)\d+` → 匹配前面有“￥”的数字
`(?<!...)`	负向后顾	`(?<!\d)\d{3}` → 匹配前面不是数字的 3 位数字

text = "价格100元，优惠200元，积分300"
print(re.findall(r'\d+(?=元)', text))          # ['100', '200']
print(re.findall(r'(?<=价格)\d+', text))       # ['100']
print(re.findall(r'\d+(?!元)', text))          # ['300']

8）转义字符

符号	含义	举例
`\`	转义	`\.` 匹配真正的句点 `"."`
`\\`	反斜杠本身	`"C:\\path"` 正则写成 `C:\\\\path`

二、方法名

1）查找所有匹配 - findall

re.findall(r'\d+', '价格123，库存456')      # ['123', '456']

2）只找一次 - search + 分组提取

m = re.search(r'(\d{4})-(\d{2})', '2024-05')
m.groups()                                # ('2024', '05')

3）从头开始 - match

bool(re.match(r'1[3-9]\d{9}', '13812345678'))   # True

4）全文精确格式 - fullmatch

bool(re.fullmatch(r'\d{6}', '123456'))      # True

5）按正则切分 - split

re.split(r'[;,|\s]+', 'a,b; c|d  e')      # ['a', 'b', 'c', 'd', 'e']

6）替换 - sub

re.sub(r'\d+', '#', '订单123与456')        # '订单#与#'

7）替换并计算 - subn

re.subn(r'\d', '*', 'a1b2c3')             # ('a*b*c*', 3)

查看全文

http://www.dtcms.com/a/365188.html

A*（Astar）算法详解与应用

【C++八股文】数据结构篇

Vue Vapor 事件机制深潜：从设计动机到源码解析

Windows 电源管理和 Shutdown 命令详解

QuillBot：AI文本重写神器（附官网），高效解决文案润色与语法检查需求

不只会修图！谷歌发布官方指南，教你用 Nano Banana 玩转文生图

Mysql数据库（性能）索引学习

如何获取easy-ui的表格的分页大小

创建Spring MVC和注解

企业资源计划（ERP）系统：数字化企业的核心引擎

数据结构——顺序表和单向链表(2)

MybatisPlus 根据实体类获取对应的Mapper

硬件开发1-51单片机2-按键、中断

Process Lasso：高效管理和优化计算机进程

并查集_路径压缩

[嵌入式embed][Qt]Qt5.12+Opencv4.x+Cmake4.x_用Qt编译linux-Opencv库测试

Linux 用户的 Windows 改造之旅

linux命名管道的使用

关于linux数据库编程——sqlite3

Unity 中打包 assetsBundle

C语言字符函数和字符串函数（1）

《网络安全实战：CC攻击（应用层）与DDoS攻击（网络层）的底层逻辑与防御体系》

基于SpringBoot+Vue开发的环境保护监督管理网站

如何通过控制台查看向量检索服务使用数据

Vue Router原理及SPA页面刷新解析

融云：当我们谈论 AI 重构业务时，我们到底在谈论什么

SAM TTS网页官网入口 – 在线版微软tts在线语音合成助手

【TRAE调教指南之MCP篇】FastMCP：快速制作自己的MCP服务

对锁的总结

Agent 热潮遇冷？Manus 为何仍是 “版本神”

一、 字符含义

二、方法名

相关文章：

一、字符含义