当前位置：首页 > news >正文

Pandas Series

news 2025/7/17 10:14:35

以下是关于 Pandas Series 的从入门到精通的系统指南，包含核心概念、操作技巧和实战示例：

1. 入门篇：基础操作

1.1 创建Series

import pandas as pd

# 从列表创建
s1 = pd.Series([1, 3, 5, 7, 9])  # 默认数字索引
s2 = pd.Series([10, 20, 30], index=['a', 'b', 'c'])  # 自定义索引

# 从字典创建（自动将键作为索引）
data_dict = {'a': 10, 'b': 20, 'c': 30}
s3 = pd.Series(data_dict)

1.2 查看基本属性

print(s2.values)   # 输出值数组：[10, 20, 30]
print(s2.index)    # 输出索引：Index(['a', 'b', 'c'], dtype='object')
print(s2.dtype)    # 数据类型：int64
print(s2.shape)    # 形状：(3,)
print(s2.size)     # 元素数量：3

1.3 索引与切片

# 按位置索引（类似列表）
print(s2[0])       # 输出：10

# 按标签索引
print(s2['b'])     # 输出：20

# 切片（包含结束位置）
print(s2[1:3])     # 输出：b=20, c=30
print(s2['a':'c']) # 标签切片包含'c'

# 布尔索引
print(s2[s2 > 15]) # 输出值大于15的元素

1.4 修改索引和值

s2.index = ['x', 'y', 'z']  # 修改索引
s2['x'] = 100               # 修改单个值
s2.replace(20, 200, inplace=True)  # 替换值

2. 进阶篇：数据处理

2.1 处理缺失值

s4 = pd.Series([1, None, 3, np.nan, 5])

# 检测缺失值
print(s4.isna())   # 返回布尔Series

# 删除缺失值
s4_drop = s4.dropna()

# 填充缺失值
s4_fill = s4.fillna(0)          # 填充0
s4_ffill = s4.fillna(method='ffill')  # 前向填充

2.2 向量化操作

# 直接数学运算
s5 = s2 * 2              # 每个元素乘以2
s6 = s2 + pd.Series([1, 2, 3], index=['x', 'y', 'z'])  # 按索引对齐运算

# 使用NumPy函数
import numpy as np
s7 = np.sqrt(s2)         # 对每个元素开平方

2.3 统计计算

print(s2.mean())         # 平均值
print(s2.sum())          # 总和
print(s2.value_counts()) # 值频次统计（适用于离散值）
print(s2.describe())     # 快速统计摘要（均值、标准差、分位数等）

2.4 应用自定义函数

# 使用apply
s8 = s2.apply(lambda x: x**2 + 1)

# 使用map（元素级转换）
s9 = s2.map({10: 'low', 20: 'mid', 30: 'high'})  # 映射替换

3. 精通篇：高级技巧

3.1 时间序列处理

# 创建时间序列
dates = pd.date_range('2023-01-01', periods=5)
s_time = pd.Series([10, 20, 15, 30, 25], index=dates)

# 按时间重采样
s_resampled = s_time.resample('W').mean()  # 按周平均

3.2 分类数据优化

# 转换为分类类型（减少内存）
s_cat = pd.Series(['apple', 'banana', 'apple', 'orange'], dtype='category')

3.3 多层索引（MultiIndex）

arrays = [['A', 'A', 'B', 'B'], [1, 2, 1, 2]]
multi_index = pd.MultiIndex.from_arrays(arrays, names=('group', 'subgroup'))
s_multi = pd.Series([10, 20, 30, 40], index=multi_index)

# 按层级访问
print(s_multi.loc['A', 1])  # 输出：10

3.4 与DataFrame交互

# 从DataFrame中提取列（本质是Series）
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['x', 'y', 'z']})
s_from_df = df['A']

# 将Series转换为DataFrame
df_from_s = s2.to_frame(name='values')

4. 实战技巧

4.1 高效过滤

# 多条件筛选
s_filtered = s2[(s2 > 15) & (s2.index != 'z')]

4.2 性能优化

# 避免循环，使用向量化操作
s_squared = s2 ** 2  # 比apply快10倍以上

# 使用eval表达式（适用于大型数据）
s_result = pd.eval('s2 * 2 + 5')

4.3 合并Series

s10 = pd.Series([100, 200], index=['x', 'y'])
combined = pd.concat([s2, s10], axis=0)  # 纵向合并

5. 常见问题

5.1 索引自动对齐

s11 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s12 = pd.Series([4, 5, 6], index=['b', 'c', 'd'])
s_sum = s11 + s12  # 结果：a=NaN, b=6, c=8, d=NaN

5.2 处理重复索引

s_dup = pd.Series([10, 20, 30], index=['a', 'a', 'b'])
s_unique = s_dup[~s_dup.index.duplicated()]  # 保留第一个重复索引

文章转载自：
http://blastosphere.jopebe.cn
http://cheapo.jopebe.cn
http://broadloom.jopebe.cn
http://bosie.jopebe.cn
http://belligerence.jopebe.cn
http://areola.jopebe.cn
http://byr.jopebe.cn
http://caprificator.jopebe.cn
http://baseless.jopebe.cn
http://buy.jopebe.cn
http://basipetal.jopebe.cn
http://bloodroot.jopebe.cn
http://capetown.jopebe.cn
http://ambrotype.jopebe.cn
http://characterisation.jopebe.cn
http://brontosaurus.jopebe.cn
http://alluvial.jopebe.cn
http://betake.jopebe.cn
http://capucine.jopebe.cn
http://assuan.jopebe.cn
http://amphitheatral.jopebe.cn
http://cestoid.jopebe.cn
http://canasta.jopebe.cn
http://backstabber.jopebe.cn
http://admissibility.jopebe.cn
http://appropriable.jopebe.cn
http://bohemianism.jopebe.cn
http://abbatial.jopebe.cn
http://blanquism.jopebe.cn
http://buntons.jopebe.cn

查看全文

http://www.dtcms.com/a/100276.html

传统策略梯度方法的弊端与PPO的改进：稳定性与样本效率的提升

【干货】前端实现文件保存总结

rce操作

唤起“堆”的回忆

基于自定义注解+反射+AOP+Redis的通用开关设计：在投行交易与风控系统的落地实践

golang 的reflect包的常用方法

低速通信之王：LIN总线工作原理入门

创作领域“＜em ＞彩＜/em＞＜em＞票＜/em＞＜em＞导＜/em＞＜em＞师＜/em＞＜em＞带＜/em＞＜em＞玩＜/em＞＜em＞群

SvelteKit 最新中文文档教程（15）—— 链接选项

C语言的sprintf函数使用

Rust 为什么不适合开发 GUI

Java后端开发：如何安装搭建Java开发环境《安装JDK》和检测JDK版本

【Tauri2】008——简单说说配置文件

QtWebApp使用

.Net framework 3.5怎样离线安装

Redis-09.Redis常用命令-通用命令

Python练习

QXmpp入门

前端学习日记--JavaScript

大模型生成吉卜力风格艺术：技术与魔法的完美结合

【附JS、Python、C++题解】Leetcode面试150题（12）多数问题

Nginx — nginx.pid打开失败及失效的解决方案

css基础之浮动相关学习

实现一个简易版的前端监控 SDK

AI训练中的专有名词大白话版

Linux《进程概念（上）》

PGD对抗样本生成算法实现（pytorch版）

React编程模型：React Streams规范详解

阿里：多模态大模型预训练数据治理

VBA第三十四期 VBA中怎么用OnKey事件