当前位置：首页 > news >正文

数据分析笔记11：数据容器2

news 2025/11/17 8:59:21

数据分析笔记11：数据容器2

元组

核心特点

元组是一种不可修改的数据容器，一旦创建，其中的数据就无法更改。这是元组的最重要特性。

使用场景：适合存储重要的、不希望被修改的核心数据。

创建方式：使用小括号 () 创建。

创建元组的方法

多个数据的元组：

t1 = (14, 15, 12)
print(t1) # (14, 15, 12)
print(type(t1)) # <class 'tuple'>

单个数据的元组：

# 错误方式
t2 = (12)
print(type(t2)) # <class 'int'> # 这不是元组！
# 正确方式：必须加逗号
t2 = (12,)
print(type(t2)) # <class 'tuple'> # 这才是元组

关键提醒：创建单个元素的元组时，必须在元素后面加逗号，否则会被识别为其他数据类型。

空元组：

t3 = ()

元组的索引查询

元组使用索引进行数据查询，索引规则与列表相同：

t1 = (14, 15, 12)
print(t1[1]) # 15（索引从0开始）

字符串

核心概念

字符串可以理解为一个容器，存放着一个一个字符。字符串支持索引和切片操作。

索引操作

b = "HIJKLMN"
print(b[1]) # I（索引为1的字符）

切片操作

基本语法：字符串[start:end:step]

start：起始索引（包含）。
end：结束索引（不包含，左闭右开）。
step：步长（默认为1）。

切片示例：

b = "HIJKLMN"
# 从索引1到5（不包含5）
print(b[1:5]) # IJKL
# 从索引1到结尾
print(b[1:]) # IJKLMN
# 从开头到结尾（全部）
print(b[:]) # HIJKLMN
# 使用步长2
print(b[::2]) # H J L N（每隔一个取一个）
# 负索引：从右向左数
print(b[:-1]) # HIJKLM（不包含最后一个）
# 倒序输出
print(b[::-1]) # NMLKJIH

重点记忆：

左边为空：从开头开始。
右边为空：到结尾为止。
负索引：从右向左数，第一个是-1（没有-0）。
步长为负：反向遍历。

字符串常用方法

count() - 计数

统计子串在字符串中出现的次数。

mystr = "hello universe and galaxy and cosmos and python"
# 统计"and"出现次数
print(mystr.count("and")) # 3
# 限制查询范围
print(mystr.count("and", 0, 25)) # 1

replace() - 替换

替换字符串中的内容。

语法：字符串.replace(旧子串, 新子串, 替换次数)。

mystr = "hello universe and galaxy and cosmos and python"
# 替换所有"and"为"or"
newstr = mystr.replace("and", "or")
print(newstr) # hello universe or galaxy or cosmos or python
# 只替换第一个
newstr = mystr.replace("and", "or", 1)
print(newstr) # hello universe or galaxy and cosmos and python

split() - 分割

按照指定分隔符分割字符串，返回列表。

mystr = "hello universe and galaxy and cosmos and python"
# 按空格分割
result = mystr.split(" ")
print(result) # ['hello', 'universe', 'and', 'galaxy', 'and', 'cosmos', 'and', 'python']
# 可以用for循环遍历
for word in result:print(word)

实际应用场景：在爬虫或数据分析中，经常需要将一大串数据按照统一的分隔符（逗号、分号、冒号等）分割成单个数据。

join() - 连接

用指定字符连接列表中的元素。

list1 = ["大", "鹏", "教", "育"]
# 用"-"连接
result = "-".join(list1)
print(result) # 大-鹏-教-育

strip() - 去除两端空白字符

去除字符串两端的空格、换行符（\n）、制表符（\t）等。

str1 = " hello universe "
print(str1.strip()) # hello universe

实际应用：在处理爬虫数据或日志数据时，经常需要先用 strip() 清理数据两端的空白字符，确保数据干净，避免索引错误。

集合

核心特点

集合中不能有重复的数据，这是集合最重要的特性。

创建方式：使用大括号 {} 或 set()。

创建集合

# 使用大括号创建
s = {12, 22, 32, 42, 12, 22, 32, 42, 12}
print(s) # {12, 22, 32, 42}（自动去重）
# 创建空集合（必须用set()）
s_empty = set()

实际应用场景

统计网站每天有多少不同用户登录：

# 假设小李今天登录了3次
login_records = ["小李", "小王", "小李", "小赵", "小李"]
# 使用集合去重
unique_users = set(login_records)
print(len(unique_users)) # 3（实际用户数）

数据序列总结对比

容器类型	符号	是否有序	是否可修改	查询方式	主要特点
列表 list	[]	有序	可修改	索引	最常用，功能强大
字典 dict	{}	无序	可修改	键（key）	键值对存储
元组 tuple	()	有序	不可修改	索引	保护重要数据
字符串 str	'' or ""	有序	不可修改	索引/切片	字符容器
集合 set	{} or set()	无序	可修改	-	自动去重