当前位置：首页 > news >正文

Python 正则表达式：入门到实战

news 2025/7/23 7:43:01

正则表达式是一种强大的文本处理工具，广泛应用于字符串匹配、搜索和替换等任务。Python 的 re 模块提供了对正则表达式的支持，使得这些任务变得简单高效。今天，就让我们一起深入学习 Python 正则表达式的基本用法和一些实用技巧。

一、正则表达式的基本概念

正则表达式（Regular Expression，简称 RE）是一种用于匹配字符串中字符组合的模式。它由普通字符和特殊字符组成，可以用来描述复杂的字符串匹配规则。

常见的正则表达式符号：

符号	描述
`.`	匹配任意单个字符（除了换行符）
`^`	匹配字符串的开始位置
`$`	匹配字符串的结束位置
`*`	匹配前面的字符 0 次或多次
`+`	匹配前面的字符 1 次或多次
`?`	匹配前面的字符 0 次或 1 次
`{n}`	匹配前面的字符恰好 n 次
`{n,m}`	匹配前面的字符至少 n 次，至多 m 次
`[abc]`	匹配字符 a、b 或 c
`[^abc]`	匹配除 a、b、c 之外的任意字符
`\d`	匹配任意数字，等价于 `[0-9]`
`\w`	匹配任意字母、数字或下划线，等价于 `[a-zA-Z0-9_]`
`\s`	匹配任意空白字符，包括空格、制表符、换页符等

二、`re` 模块的基本用法

1. `re.match()`

从字符串的起始位置匹配模式。如果匹配成功，返回一个匹配对象；否则返回 None。

示例代码：

import repattern = r'^hello'
text = "hello world"
match = re.match(pattern, text)
if match:print("Match found:", match.group())
else:print("No match")

2. `re.search()`

在字符串中查找匹配的模式。如果找到匹配项，返回一个匹配对象；否则返回 None。

示例代码：

import repattern = r'world'
text = "hello world"
match = re.search(pattern, text)
if match:print("Match found:", match.group())
else:print("No match")

3. `re.findall()`

找到所有匹配的模式，并以列表形式返回。

示例代码：

import repattern = r'\d+'
text = "123 abc 456"
matches = re.findall(pattern, text)
print("Matches found:", matches)

4. `re.sub()`

替换匹配的模式。

示例代码：

import repattern = r'\d+'
text = "123 abc 456"
new_text = re.sub(pattern, 'X', text)
print("New text:", new_text)

5. `re.split()`

按照匹配的模式分割字符串。

示例代码：

import repattern = r'\s+'
text = "hello world"
parts = re.split(pattern, text)
print("Parts:", parts)

三、正则表达式的高级技巧

1. 分组

分组可以在需要其他规则辅助定位，但又不想获取这些规则所匹配到的内容时使用。

示例代码：

import repattern = r'(\w+)\.'
text = "abd. efg. 123sd"
matches = re.findall(pattern, text)
print("Matches found:", matches)

2. 命名捕获组

可以使用 (?P<name>...) 语法来给捕获组命名。

示例代码：

import repattern = r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})"
text = "Today is 2023-09-18."
match = re.search(pattern, text)
if match:year = match.group("year")month = match.group("month")day = match.group("day")print(f"Year: {year}, Month: {month}, Day: {day}")

3. 非捕获组

可以使用 (?:...) 语法来创建非捕获组，即不捕获匹配的内容。

示例代码：

import repattern = r"(?:Mr.|Mrs.) (\w+)"
text = "Mr. Smith and Mrs. Johnson"
matches = re.findall(pattern, text)
print("Matches found:", matches)

4. 预编译规则

预编译可以避免每次调用时的编译性能消耗。

示例代码：

import repattern = re.compile(r'\d+')
matches = pattern.findall("123 abc 456")
print("Matches found:", matches)

5. 多行文本匹配

使用 re.M 标志可以在多行文本中匹配每一行的内容。

示例代码：

import retext = """
this is
a
multiple
line
text
.
"""
matches = re.findall(r'^[al]', text, re.M)
print("Matches found:", matches)

6. 匹配中文

可以使用 Unicode 范围来匹配中文字符。

示例代码：

import retext = "你好，我是 john"
matches = re.findall(r'[\u4e00-\u9fa5]', text)
print("Matches found:", matches)

7. 负向预查

负向预查允许你在匹配之前指定一个条件，该条件必须不满足才进行匹配。

示例代码：

import repattern = r"Windows(?=95|98|NT|2000)"
text = "Windows95, Windows98, WindowsXP"
matches = re.findall(pattern, text)
print("Matches found:", matches)

8. 正向预查

正向预查允许你在匹配之前指定一个条件，该条件必须满足才进行匹配。

示例代码：

import repattern = r"(?<=@)\w+"
text = "Email addresses: alice@example.com, bob@gmail.com"
matches = re.findall(pattern, text)
print("Matches found:", matches)

四、总结

正则表达式是 Python 中处理字符串的强大工具，re 模块提供了丰富的功能来满足各种文本处理需求。通过学习基本的正则表达式符号和 re 模块的常用函数，你可以轻松实现字符串匹配、搜索和替换等操作。掌握正则表达式的高级技巧，如分组、命名捕获组、非捕获组和预编译规则，将使你能够处理更复杂的文本处理任务。

查看全文

http://www.dtcms.com/a/292484.html