文本处理三核心:grep(过滤匹配)、sed(流编辑)、awk(结构化分析)
一、正则表达式
正则表达式(Regular Expression,简称 RE)是用于描述字符排列和匹配模式的语法规则,Linux 工具(如 sed、gawk、grep)可通过它过滤、处理文本——匹配模式的数据被保留/处理,不匹配的被滤除。
1、概述
(1)概念
- 正则表达式是模式模板,用于字符串的分割、匹配、查找、替换操作,本质是通过特殊字符排序实现对文本的批量处理(删除、查找、替换单行/多行字符串)。
- 数据流处理逻辑:
数据流 → 正则表达式 → (1)匹配的数据(保留/处理) (2)滤掉的数据(丢弃)
(2)组成与作用
- 组成:由「普通字符」(大小写字母、数字、标点等)和「元字符」(具有特殊意义的专用字符,规定前导字符的出现模式)构成。
- 作用:常用于判断语句,检查字符串是否符合指定格式。
(3)核心目的
- 验证字符串是否符合正则的过滤逻辑(匹配验证);
- 从字符串中提取特定部分(内容提取)。
2、基础正则
支持工具:grep、egrep、sed、awk(注:egrep/awk 使用 {n}、{n,}、{n,m}
时,{}
前无需加 \
)
元字符 | 功能描述 | 示例 |
---|
\ | 转义字符,取消特殊符号含义 | \! (匹配 ! )、\n (匹配换行)、\$ (匹配 $ ) |
^ | 匹配字符串开始位置 | ^a (匹配以 a 开头的行)、^# (匹配以 # 开头的注释行)、^[a-z] (匹配以小写字母开头的行) |
$ | 匹配字符串结束位置 | word$ (匹配以 word 结尾的行)、^$ (匹配空行) |
. | 匹配除 \n 外的任意一个字符 | lo.k (匹配 lok /lak 等,中间1个任意字符)、l..k (匹配 look /lark 等,中间2个任意字符)、lo.*k (匹配 lok /look /loaaak 等,中间任意个字符) |
* | 匹配前面1个字符0次或多次 | lo*k (匹配 lk /lok /look 等,o 可无/多个)、loo*k (匹配 lok /look 等,o 至少1个) |
[list] | 匹配 list 列表中的任意一个字符 | go[ola]d (匹配 gold /goad /good )、[a-z0-9] (匹配任意小写字母或数字)、[0-9] (匹配任意1位数字) |
[^list] | 匹配非 list 列表中的任意一个字符 | [^0-9] (匹配非数字)、[^a-z] (匹配非小写字母)、[^A-Z0-9] (匹配非大写字母和数字) |
\{n\} | 匹配前面1个字符恰好n次 | lo\{2\}k (匹配 look ,o 恰好2次)、[0-9]\{2\} (匹配2位数字,如 12 /99 ) |
\{n,\} | 匹配前面1个字符不少于n次 | lo\{2,\}k (匹配 look /loook 等,o 至少2次)、[0-9]\{2,\} (匹配2位及以上数字) |
\{n,m\} | 匹配前面1个字符n到m次 | lo\{2,3\}k (匹配 look /loook ,o 2-3次)、[0-9]\{2,3\} (匹配2-3位数字) |
基础正则案例(基于文件 a.txt
/b.txt
/c.txt
)
案例1:*
匹配(0次或多次前导字符)
[root@hrz2 tmp]# cat a.txt
lk
lok
look
loook
looooook
loooooaaak
looooooook
abbbbcd
abbbbcd666
ooooloooook
oooooolk
aoblck
[root@hrz2 tmp]# grep "lo*k" a.txt
lk
lok
look
loook
looooook
looooooook
ooooloooook
oooooolk
[root@hrz2 tmp]# grep "loo*k" a.txt
lok
look
loook
looooook
looooooook
ooooloooook
案例2:.
匹配(任意1个字符)
[root@hrz2 tmp]# grep "lo.*k" a.txt
lok
look
loook
looooook
loooooaaak
looooooook
ooooloooook
[root@hrz2 tmp]# grep "lo.k" a.txt
look
[root@hrz2 tmp]# grep "l..k" a.txt
look
案例3:\{n}、\{n,}、\{n,m}
匹配(指定次数)
[root@hrz2 tmp]# grep "lo\{2\}k" a.txt
look
[root@hrz2 tmp]# grep "lo\{3\}k" a.txt
loook
[root@hrz2 tmp]# grep "lo\{3,\}k" a.txt
loook
looooook
looooooook
ooooloooook
[root@hrz2 tmp]# grep "lo\{3,5\}k" a.txt
loook
ooooloooook
案例4:^、$、^$
匹配(行首/行尾/空行)
[root@hrz2 tmp]# cat b.txt
aaabd
cdd
cdc
cdd
[root@hrz2 tmp]# grep "^c" b.txt
cdd
cdc
cdd
[root@hrz2 tmp]# grep "d$" b.txt
abd
cdd
cdd
[root@hrz2 tmp]# grep "^$" b.txt
(输出空行)
案例5:[list]、[^list]
匹配(指定/非指定字符集)
[root@hrz2 tmp]# cat c.txt
lok
lo12k
lo1k
loAk
loBk
look
loak
lodk
abcd
1234
[root@hrz2 tmp]# grep "lo[a-zA-Z0-9]k" c.txt
lo1k
loAk
loBk
look
loak
lodk
[root@hrz2 tmp]# grep "lo[ABo]k" c.txt
loAk
loBk
look
[root@hrz2 tmp]# grep "lo[^a-zA-Z]k" c.txt
lo1k
[root@hrz2 tmp]# grep "[^a-zA-Z]" c.txt
lo12k
lo1k
1234
3、扩展正则
支持工具:egrep、awk(无需转义 {}
),扩展元字符在基础正则上增加了更灵活的匹配逻辑。
元字符 | 功能描述 | 示例 |
---|
+ | 匹配前面1个字符1次及以上(区别于* :至少1次) | lo+k (匹配 lok /look 等,o 至少1次,排除 lk ) |
? | 匹配前面1个字符0次或1次(最多1次) | lo?k (匹配 lk /lok ,o 0或1次,排除 look 等) |
() | 将括号内字符串视为1个整体(分组) | l(oo)+k (匹配 look /looook 等,oo 整体至少1次) |
` | ` | 逻辑“或”,匹配多个模式中的任意一个 |
{n} | 同基础正则 \{n\} (无需转义) | lo{2}k (匹配 look ) |
{n,} | 同基础正则 \{n,\} (无需转义) | lo{2,}k (匹配 look /loook 等) |
{n,m} | 同基础正则 \{n,m\} (无需转义) | lo{2,3}k (匹配 look /loook ) |
扩展正则案例(基于 a.txt
)
案例1:+
匹配(至少1次前导字符)
[root@hrz2 tmp]# egrep "lo+k" a.txt
lok
look
loook
looooook
looooooook
ooooloooook
案例2:?
匹配(0或1次前导字符)
[root@hrz2 tmp]# egrep "lo?k" a.txt
lk
lok
oooooolk
案例3:()
分组匹配(整体处理)
[root@hrz2 tmp]# egrep "l(oo)+k" a.txt
look
looooook
looooooook
案例4:|
逻辑“或”匹配
[root@hrz2 tmp]# echo labk >> a.txt
[root@hrz2 tmp]# egrep "l(oo|ab)+k" a.txt
look
looooook
looooooook
labk
案例5:{n}、{n,}、{n,m}
匹配(无需转义)
[root@hrz2 tmp]# egrep "lo{3}k" a.txt
loook
[root@hrz2 tmp]# egrep "lo{3,}k" a.txt
loook
looooook
looooooook
ooooloooook
[root@hrz2 tmp]# egrep "lo{3,5}k" a.txt
loook
ooooloooook
4、特殊字符组
用于简化常用字符集的书写,适用于所有支持正则的Linux工具。
字符组 | 描述 |
---|
[[:alpha:]] | 匹配任意字母(大写A-Z或小写a-z) |
[[:alnum:]] | 匹配任意字母或数字(0-9、A-Z、a-z) |
[[:blank:]] | 匹配空格或Tab键 |
[[:digit:]] | 匹配0-9之间的数字(同 [0-9] ) |
[[:lower:]] | 匹配小写字母(a-z) |
[[:print:]] | 匹配任意可打印字符(含空格) |
[[:punct:]] | 匹配标点符号(如 !、@、#、$ 等) |
[[:space:]] | 匹配任意空白字符(空格、Tab、换行等) |
[[:upper:]] | 匹配大写字母(A-Z) |
二、grep 工具
grep 是Linux中最基础的文本过滤工具,核心功能是“按模式匹配文本行”,常用选项如下:
选项 | 功能描述 |
---|
-v | 反向匹配:输出不匹配模式的行 |
-i | 忽略大小写:匹配时不区分字母大小写 |
-o | 仅显示匹配的部分(而非整行) |
无选项 | 输出匹配模式的整行 |
grep 常用用法示例
grep "字符串" 文件名
grep -v "字符串" 文件名
grep "^字符串" 文件名
grep "字符串$" 文件名
grep "^$" 文件名
grep -i "字符串" 文件名
grep -o "字符串" 文件名
三、sed 编辑器
sed(流编辑器)是按行处理文本的工具:将当前行存入“模式空间”,用sed命令处理后输出到屏幕,循环至文件末尾(默认不修改原文件,需 -i
选项修改)。
1、sed 核心参数
(1)选项
选项 | 功能描述 |
---|
-n | 不自动打印模式空间(需配合 p 命令显式打印) |
-e | 执行多个sed脚本/表达式(多命令用 ; 分隔) |
-f | 从文件中读取sed命令(而非命令行输入) |
-i | 直接修改原文件(慎用,建议先备份) |
-r | 支持扩展正则表达式(无需转义 `+、?、()、 |
(2)常用命令
命令 | 功能描述 |
---|
s/regexp/replace/ | 替换:将模式空间中匹配 regexp 的内容替换为 replace (g 修饰符表全局替换) |
p | 打印当前模式空间的内容(配合 -n 使用) |
P | 打印模式空间的第一行(区别于 p :仅首行) |
d | 删除当前模式空间的内容,开始下一个循环 |
D | 删除模式空间的第一行,开始下一个循环 |
= | 打印当前行的行号 |
a \text | 在当前行后追加文本 text |
i \text | 在当前行前插入文本 text |
c \text | 用文本 text 替换当前行 |
q | 立即退出sed脚本(停止后续处理) |
r 文件名 | 读取指定文件的内容,追加到当前行后 |
w 文件名 | 将当前模式空间的内容写入指定文件 |
h | 复制模式空间内容到“保持空间”(覆盖) |
H | 复制模式空间内容到“保持空间”(追加) |
g | 复制保持空间内容到模式空间(覆盖) |
G | 复制保持空间内容到模式空间(追加) |
x | 交换模式空间与保持空间的内容 |
l | 打印模式空间的行,并显示控制字符(如 $ 表示行尾) |
n | 读取下一行到模式空间(覆盖原内容) |
N | 读取下一行追加到模式空间(保留原内容,用 \n 分隔) |
! | 对不匹配模式的行执行命令(取反) |
& | 引用模式空间中已匹配的内容 |
(3)地址格式(指定sed命令作用的行)
地址格式 | 功能描述 |
---|
first~step | 从第 first 行开始,每 step 行执行命令(如 1~2 表示奇数行) |
$ | 匹配最后一行 |
/regexp/ | 匹配包含 regexp (正则)的行 |
number | 匹配指定行号(如 3 表示第3行) |
addr1,addr2 | 匹配从 addr1 到 addr2 的所有行(如 1,5 表示1-5行) |
addr1,+N | 从 addr1 行开始,向后N行(如 3,+2 表示3-5行) |
addr1,~N | 从 addr1 行开始,到第N行结束 |
2、sed 案例(基于 /tmp/services
等文件)
案例1:打印操作(p
命令,配合 -n
)
[root@hrz2 tmp]# cat /tmp/services
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# sed -n '/^blp5/p' /tmp/services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
[root@hrz2 tmp]# sed -n '1p' /tmp/services
nimgtw 48003/udp # Nimbus Gateway
[root@hrz2 tmp]# sed -n '1,3p' /tmp/services
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
[root@hrz2 tmp]# seq 10 | sed -n '1~2p'
1
3
5
7
9
[root@hrz2 tmp]# seq 10 | sed '0~2d'
1
3
5
7
9
[root@hrz2 tmp]# sed -n '/blp5/,+1p' /tmp/services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
[root@hrz2 tmp]# sed -n '$p' /tmp/services
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# sed -n '$!p' /tmp/services
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
[root@hrz2 tmp]# sed -n '/^blp5/,/^com/p' /tmp/services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
[root@hrz2 tmp]# a=1
[root@hrz2 tmp]# sed -n "$a,3p" /tmp/services # 或 sed -n ''$a',3p'
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
案例2:删除操作(d
命令)
[root@hrz2 tmp]# sed '/blp5/d' /tmp/services
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# sed '1d' /tmp/services
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# sed '1~2d' /tmp/services
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# sed '1,3d' /tmp/services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
sed '/^#/d;/^$/d' /etc/httpd/conf/httpd.conf | sed -r '/[[:space:]]+#/d'
案例3:替换操作(s/regexp/replace/
命令)
[root@hrz2 tmp]# sed 's/blp5/test/' /tmp/services
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
test 48129/tcp # Bloomberg locator
test 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# sed 's/blp5/test/g' /tmp/services
[root@hrz2 tmp]# sed -n 's/^blp5/test/p' /tmp/services
test 48129/tcp # Bloomberg locator
test 48129/udp # Bloomberg locator
[root@hrz2 tmp]# sed 's/48049/&.123/' /tmp/services
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049.123/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# echo '10.10.10.1 10.10.10.2 10.10.10.3' | sed -r 's/[^ ]+/"&"/g'
"10.10.10.1" "10.10.10.2" "10.10.10.3"
[root@hrz2 tmp]# sed '1,5s/blp5/test/' /tmp/services
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
test 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# sed '/48129\/tcp/s/blp5/test/' /tmp/services
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
test 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# sed 's/blp5/test/;s/3g/4g/' /tmp/services
nimgtw 48003/udp # Nimbus Gateway
4gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
test 48129/tcp # Bloomberg locator
test 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# sed -r 's/(.*)(480.*)(#.*)/\1\2test \3/' /tmp/services
nimgtw 48003/udp test # Nimbus Gateway
3gpp-cbsp 48049/tcp test # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# sed -r 's/(.*)(\<[0-9]+\>)\/(tcp|udp)(.*)/\1\3\/\2\4/' /tmp/services
nimgtw udp/48003 # Nimbus Gateway
3gpp-cbsp tcp/48049 # 3GPP Cell Broadcast Service Protocol
isnetserv tcp/48128 # Image Systems Network Services
isnetserv udp/48128 # Image Systems Network Services
blp5 tcp/48129 # Bloomberg locator
blp5 udp/48129 # Bloomberg locator
com-bardac-dw tcp/48556 # com-bardac-dw
com-bardac-dw udp/48556 # com-bardac-dw
iqobject tcp/48619 # iqobject
iqobject udp/48619 # iqobject
[root@hrz2 tmp]# echo "abc:cde;123:456" | sed -r 's/([^:]*)(;.*:)([^:]+$)/\3\2\1/'
abc:456;123:cde
[root@hrz2 tmp]# seq 10 | sed '/5/,+3s/^/#/'
1
2
3
4
#5
#6
#7
#8
9
10
[root@hrz2 tmp]# seq 5 | sed -r '/^3|^4/s/^/#/'
1
2
#3
#4
5
[root@hrz2 tmp]# seq 5 | sed -r 's/^3|^4/#&/' # &引用匹配的3/4
1
2
#3
#4
5
案例4:多重编辑(-e
或 ;
分隔命令)
[root@hrz2 tmp]# sed -e '1,4d' -e 's/blp5/test/' /tmp/services
test 48129/tcp # Bloomberg locator
test 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# sed '1,4d;s/blp5/test/' /tmp/services
案例5:添加/替换文本(a、i、c
命令)
[root@hrz2 tmp]# sed '/blp5/i \test' /tmp/services
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
test
blp5 48129/tcp # Bloomberg locator
test
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# sed '/blp5/a \test' /tmp/services
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
test
blp5 48129/udp # Bloomberg locator
test
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# sed '/blp5/c \test' /tmp/services
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
test
test
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# sed '2a \test' /tmp/services
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
test
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# seq 5 | sed '3s/.*/txt\n&/' # 第3行前加txt
1
2
txt
3
4
5
[root@hrz2 tmp]# seq 5 | sed '3s/.*/&\ntest/' # 第3行后加test
1
2
3
test
4
5
案例6:读取外部文件(r
命令)
[root@hrz2 tmp]# cat a.txt
123
456
[root@hrz2 tmp]# sed '/blp5/r a.txt' /tmp/services
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
123
456
blp5 48129/udp # Bloomberg locator
123
456
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
案例7:写入文件(w
命令)
[root@hrz2 tmp]# sed '/blp5/w b.txt' /tmp/services
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# cat b.txt
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
案例8:读取下一行(n
与 N
命令)
n
:读取下一行覆盖模式空间;N
:读取下一行追加到模式空间(用 \n
分隔)。
[root@hrz2 tmp]# seq 5 | sed -n '/3/{n;p}'
4
[root@hrz2 tmp]# seq 6 | sed -n 'n;p'
2
4
6
[root@hrz2 tmp]# seq 6 | sed 'n;d'
1
3
5
[root@hrz2 tmp]# seq 6 | sed -n 'p;n'
1
3
5
[root@hrz2 tmp]# seq 6 | sed 'n;n;s/^/=/;s/$/=/'
1
2
=3=
4
5
=6=
[root@hrz2 tmp]# seq 6 | sed '3~3{s/^/=/;s/$/=/}'
1
2
=3=
4
5
=6=
[root@hrz2 tmp]# seq 6 | sed 'N;q'
1
2
[root@hrz2 tmp]# seq 6 | sed 'N;s/\n//'
12
34
56
[root@hrz2 tmp]# seq 5 | sed -n '$!N;p' # $!:非最后一行执行N
1
2
3
4
5
案例9:模式空间首行操作(P
与 D
命令)
[root@hrz2 tmp]# seq 6 | sed -n 'N;P'
1
3
5
[root@hrz2 tmp]# seq 6 | sed 'N;D'
6
案例10:保持空间操作(h/H、g/G、x
命令)
- 保持空间:临时存储区域,默认空,用于暂存模式空间内容。
[root@hrz2 tmp]# seq 6 | sed -e '/3/{h;d}' -e '/5/g'
1
2
4
3
6
[root@hrz2 tmp]# seq 6 | sed -e '/3/{h;d}' -e '$G'
1
2
4
5
6
3
[root@hrz2 tmp]# seq 6 | sed -e '/3/{h;d}' -e '/5/x' -e '$G'
1
2
4
3
6
5
四、awk 工具
awk 是编程语言级别的文本处理工具,支持记录(行) 和字段(列) 级处理(grep/sed无法实现),默认:
- 记录:1行 = 1个记录(分隔符
\n
); - 字段:1个记录中,用空格/Tab分隔的部分 = 字段(
$1
第1列、$2
第2列… $0
整行)。
1、awk 核心参数
(1)选项
选项 | 功能描述 |
---|
-f 脚本文件 | 从指定脚本文件中读取awk命令(而非命令行) |
-F 分隔符 | 指定字段分隔符(默认空格/Tab) |
-v 变量=值 | 定义awk变量(可引用系统变量) |
--posix | 兼容POSIX正则表达式 |
--dump-variables=[文件] | 将全局变量写入文件(默认 awkvars.out ) |
--profile=[文件] | 将awk语句格式化写入文件(默认 awkprof.out ) |
(2)模式(指定awk命令作用的行)
模式 | 功能描述 |
---|
BEGIN{} | 处理文件前执行(初始化变量、打印页眉等) |
END{} | 处理文件后执行(打印页脚、统计结果等) |
/regexp/ | 匹配包含 regexp (正则)的行 |
pattern1 && pattern2 | 逻辑“与”:同时满足两个模式 |
`pattern1 | |
!pattern | 逻辑“非”:不满足模式 |
pattern1,pattern2 | 范围模式:从匹配 pattern1 的行到匹配 pattern2 的行 |
2、awk 案例(基于 /tmp/services
//etc/passwd
等)
案例1:从脚本文件读取命令(-f
选项)
[root@hrz2 tmp]# cat test.awk
{print $2}
[root@hrz2 tmp]# awk -f test.awk /tmp/services
48003/udp
48049/tcp
48128/tcp
48128/udp
48129/tcp
48129/udp
48556/tcp
48556/udp
48619/tcp
48619/udp
案例2:指定字段分隔符(-F
选项)
[root@hrz2 tmp]# awk -F ':' '{print $1}' /etc/passwd
root
bin
daemon
adm
lp
...
[root@hrz2 tmp]# tail -n3 /tmp/services | awk -F'[/#]' '{print $3}' # 第3列com-bardac-dwiqobjectiqobject
[root@hrz2 tmp]# tail -n3 /tmp/services | awk -F'[/#]' '{print $1}' # 第1列
com-bardac-dw 48556
iqobject 48619
iqobject 48619
[root@hrz2 tmp]# tail -n3 /tmp/services | awk -F'[ /]+' '{print $2}'485564861948619
案例3:变量赋值(-v
选项)
[root@hrz2 tmp]# awk -v a=123 'BEGIN{print a}'
123
[root@hrz2 tmp]# a=123 # 系统变量
[root@hrz2 tmp]# awk -v a=$a 'BEGIN{print a}' # -v引用
123
[root@hrz2 tmp]# awk 'BEGIN{print '$a'}' # 单引号转义
123
案例4:BEGIN
与 END
模式
[root@hrz2 tmp]# tail /tmp/services | awk 'BEGIN{print "Service\t\tPort\t\t\tDescription\n==="}{print $0}'
Service Port Description
===
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
[root@hrz2 tmp]# tail /tmp/services | awk '{print $0}END{print "===\nEND......"}'
nimgtw 48003/udp # Nimbus Gateway
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
isnetserv 48128/udp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
com-bardac-dw 48556/udp # com-bardac-dw
iqobject 48619/tcp # iqobject
iqobject 48619/udp # iqobject
===
END......
案例5:格式化输出到文件(--profile
选项)
[root@hrz2 tmp]# tail /tmp/services | awk --profile 'BEGIN{print"Service\t\tPort\t\t\tDescription\n==="}{print $0}END{print "===\nEND......"}'
[root@hrz2 tmp]# cat awkprof.out # gawk profile, created Thu Sep 15 07:45:12 2022# BEGIN rule(s)BEGIN {print "Service\t\tPort\t\t\tDescription\n==="}# Rule(s){print $0}# END rule(s)END {print "===\nEND......"}
案例6:正则匹配(/regexp/
模式)
[root@hrz2 tmp]# awk '/tcp/{print $0}' /tmp/services
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
iqobject 48619/tcp # iqobject
案例7:逻辑运算(&&、||、!
)
[root@hrz2 tmp]# awk '/blp5/ && /tcp/{print $0}' /tmp/services
blp5 48129/tcp # Bloomberg locator
[root@hrz2 tmp]# awk '/blp5/ || /tcp/{print $0}' /tmp/services
3gpp-cbsp 48049/tcp # 3GPP Cell Broadcast Service Protocol
isnetserv 48128/tcp # Image Systems Network Services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw
iqobject 48619/tcp # iqobject
[root@hrz2 tmp]# awk '! /^#/ && ! /^$/{print $0}' /etc/httpd/conf/httpd.conf | awk '! /^ +#/{print $0}'
案例8:范围模式(pattern1,pattern2
)
[root@hrz2 tmp]# awk '/^blp5/,/^com/' /tmp/services
blp5 48129/tcp # Bloomberg locator
blp5 48129/udp # Bloomberg locator
com-bardac-dw 48556/tcp # com-bardac-dw