当前位置：首页 > news >正文

（17）100天python从入门到拿捏《正则表达式》

news 2025/10/13 8:40:10

正则表达式

正则表达式是 Python 文本处理中最强大的工具之一，它用于字符串的匹配、查找、替换、验证与提取。

文章目录

正则表达式
一、正则表达式基础
- 1.1 什么是正则表达式？
- 1.2 在 Python 中使用正则模块 `re`
二、常用正则表达式语法
- 2.1 字符匹配基础
- 2.2 重复次数匹配
- 2.3 字符集与分组
- 2.4 特殊转义字符
- 2.5 常见模式修饰符flags
三、常见案例
- 3.1 案例1：判断手机号格式是否合法
- 3.2 案例2：提取字符串中的数字
- 3.3 案例3：验证邮箱格式
- 3.4 案例4：替换字符串中的敏感词
- 3.5 案例5：从HTML中提取超链接
- 3.6 案例6：提取日期
- 3.7 案例7：分割字符串（多个分隔符）
- 3.8 案例8：使用 `re.compile()` 提升性能
- 3.9 案例9：`finditer()` 获取位置索引
- 3.10 案例10：匹配多行文本
四、常见正则表达式实用模板

一、正则表达式基础

1.1 什么是正则表达式？

正则表达式是一种用来描述字符串匹配规则的模式语言。
作用：

判断字符串是否符合某种格式；
提取字符串中符合条件的部分；
替换匹配的内容；
拆分字符串。

1.2 在 Python 中使用正则模块 `re`

Python 提供内置模块 re，当匹配成功时，会返回一个 Match 对象,否则返回None

常用函数如下

函数	作用
`re.match(pattern, string)`	从字符串起始位置开始匹配
`re.search(pattern, string)`	扫描整个字符串，返回第一个匹配项
`re.findall(pattern, string)`	返回所有匹配结果列表
`re.finditer(pattern, string)`	返回所有匹配结果的迭代器
`re.sub(pattern, repl, string)`	替换匹配到的内容
`re.split(pattern, string)`	按匹配分割字符串
`re.compile(pattern)`	预编译正则表达式，提高效率

注意： match 和 search 是匹配一次 findall 匹配所有。

二、常用正则表达式语法

2.1 字符匹配基础

表达式	含义	说明
`.`	匹配除换行符外任意字符	`a.b` → “acb” ✅
`^`	匹配字符串开头	`^Hello` 匹配以 Hello 开头
`$`	匹配字符串结尾	`world$` 匹配以 world 结尾
`\d`	匹配数字（0–9）	`\d\d` → “23” ✅
`\D`	匹配非数字	`\D` → “A” ✅
`\w`	匹配字母数字下划线	`\w+` → “hello_123” ✅
`\W`	匹配非字母数字下划线
`\s`	匹配空白字符（空格、Tab、换行）
`\S`	匹配非空白字符

使用原始字符串 r"pattern"；模式元素(如 r’\t’，等价于 \t )匹配相应的特殊字符。

2.2 重复次数匹配

表达式	含义	说明
`*`	重复0次或多次	`ab*` → “a”, “abbb” ✅
`+`	重复1次或多次	`ab+` → “ab”, “abbbb” ✅
`?`	重复0次或1次	`ab?` → “a”, “ab” ✅
`{n}`	重复n次	`a{3}` → “aaa” ✅
`{n,}`	重复n次或以上	`a{2,}` → “aa”, “aaaaa” ✅
`{n,m}`	重复n到m次	`a{2,4}` → “aa”, “aaa”, “aaaa” ✅

2.3 字符集与分组

表达式	含义
`[abc]`	匹配a或b或c中的任意一个
`[^abc]`	匹配除a,b,c外的任意字符
`(abc)`	分组匹配，捕获“abc”
`(?:abc)`	非捕获分组，不保存内容
`a	b`

2.4 特殊转义字符

表达式	含义
`\\`	匹配反斜杠
`\n`	匹配换行符
`\t`	匹配制表符

2.5 常见模式修饰符flags

修饰符	说明
`re.I` 或 `re.IGNORECASE`	忽略大小写
`re.M` 或 `re.MULTILINE`	多行匹配（^ 和 $ 匹配每一行）
`re.S` 或 `re.DOTALL`	让 `.` 匹配包括换行符在内的任意字符

三、常见案例

3.1 案例1：判断手机号格式是否合法

import retext = "138-9999-0000"
pattern = r"^1[3-9]\d{9}$"  # 中国大陆手机号print(bool(re.match(pattern, "13899990000")))  # ✅ True
print(bool(re.match(pattern, text)))            # ❌ False (因为中间有 '-')

3.2 案例2：提取字符串中的数字

import retext = "商品价格: 价格199元，优惠后149元"
numbers = re.findall(r"\d+", text)
print(numbers)  # ['199', '149']

3.3 案例3：验证邮箱格式

import reemail = "hello_world99@163.com"
pattern = r"^[\w.-]+@[\w.-]+\.\w+$"print(bool(re.match(pattern, email)))  # ✅ True

3.4 案例4：替换字符串中的敏感词

import retext = "这个人很垃圾"
clean_text = re.sub(r"垃圾", "**", text)
print(clean_text)  # 这个人很**

3.5 案例5：从HTML中提取超链接

import rehtml = '<a href="https://example.com">Example</a>'
pattern = r'href="(.*?)"'link = re.findall(pattern, html)
print(link)  # ['https://example.com']

3.6 案例6：提取日期

import retext = "会议时间：2025-10-12 10:30"
pattern = r"(\d{4})-(\d{2})-(\d{2})"match = re.search(pattern, text)
if match:print("年:", match.group(1))print("月:", match.group(2))print("日:", match.group(3))

3.7 案例7：分割字符串（多个分隔符）

import retext = "apple, banana; orange|grape"
fruits = re.split(r"[,;|]\s*", text)
print(fruits)  # ['apple', 'banana', 'orange', 'grape']

3.8 案例8：使用 `re.compile()` 提升性能

import repattern = re.compile(r"\d{3}-\d{3,8}")
texts = ["010-12345", "021-7654321", "0755-888888"]for t in texts:if pattern.match(t):print(f"匹配成功: {t}")

3.9 案例9：`finditer()` 获取位置索引

import retext = "Tom is 18, Jerry is 22"
pattern = r"\d+"for m in re.finditer(pattern, text):print(f"找到数字 {m.group()}，位置: {m.span()}")

3.10 案例10：匹配多行文本

import retext = """Hello world
Python is great
Regex is powerful"""pattern = re.compile(r"^Python.*$", re.M)
print(re.findall(pattern, text))  # ['Python is great']

四、常见正则表达式实用模板

场景	正则表达式
匹配邮箱	`^[\w.-]+@[\w.-]+\.\w+$`
匹配手机号（中国大陆）	`^1[3-9]\d{9}$`
匹配身份证号	`^\d{17}[\dXx]$`
匹配网址	`https?://[^\s]+`
匹配IPv4地址	`(?:\d{1,3}\.){3}\d{1,3}`
匹配日期	`\d{4}-\d{2}-\d{2}`
匹配HTML标签	`<[^>]+>`
匹配中文字符	`[\u4e00-\u9fa5]+`

python学习专栏导航
（1）100天python从入门到拿捏《Python 3简介》
（2）100天python从入门到拿捏《python应用前景》
（3）100天python从入门到拿捏《数据类型》
（4）100天python从入门到拿捏《运算符》
（5）100天python从入门到拿捏《流程控制语句》
（6）100天python从入门到拿捏《推导式》
（7）100天python从入门到拿捏《迭代器和生成器》
（8）100天python从入门到拿捏《函数和匿名函数》
（9）100天python从入门到拿捏《装饰器》
（10）100天python从入门到拿捏《Python中的数据结构与自定义数据结构》
（11）100天python从入门到拿捏《模块》
（12）100天python从入门到拿捏《文件操作》
（13）100天python从入门到拿捏《目录操作》
（14）100天python从入门到拿捏《Python的错误与异常机制》
（15）100天python从入门到拿捏《面向对象编程》
（16）100天python从入门到拿捏《标准库》