当前位置：首页 > news >正文

在Pandas中可以根据某一列的值生成新的列

news 2025/10/9 14:17:51

在 Pandas 中，可以根据某一列的值生成新的列，常见的方法包括：

df['新列'] = df['某列'].map(字典)
df['新列'] = df['某列'].apply(函数)
np.where(条件, 真值, 假值)
df.loc[条件, '新列'] = 值
df.assign(新列=lambda x: 计算逻辑)
df['某列'].str.方法()（字符串处理）

文章目录

1. 使用 `map()` 映射字典
2. 使用 `apply()` 应用函数
3. 使用 `np.where()` 条件判断
- 多条件判断（`np.select()`）
4. 使用 `loc[]` 条件赋值
5. 使用 `assign()` 生成新列
6. 字符串处理（`str.方法()`）
总结

1. 使用 `map()` 映射字典

适用于 离散值 的映射（如分类编码）。

import pandas as pddf = pd.DataFrame({'gender': ['男', '女', '男', '女']})# 定义映射字典
gender_map = {'男': 'M', '女': 'F'}# 生成新列
df['gender_code'] = df['gender'].map(gender_map)print(df)

输出：

  gender gender_code
0      男           M
1      女           F
2      男           M
3      女           F

2. 使用 `apply()` 应用函数

适用于 复杂逻辑 的计算。

def calculate_bonus(salary):if salary > 10000:return salary * 0.2else:return salary * 0.1df = pd.DataFrame({'name': ['Alice', 'Bob'], 'salary': [12000, 8000]})# 生成新列
df['bonus'] = df['salary'].apply(calculate_bonus)print(df)

输出：

    name  salary  bonus
0  Alice   12000  2400.0
1    Bob    8000   800.0

3. 使用 `np.where()` 条件判断

适用于 二分类 或 多条件 赋值。

import numpy as npdf = pd.DataFrame({'score': [85, 60, 45, 90]})# 生成新列（及格/不及格）
df['result'] = np.where(df['score'] >= 60, '及格', '不及格')print(df)

输出：

   score result
0     85     及格
1     60     及格
2     45   不及格
3     90     及格

多条件判断（`np.select()`）

conditions = [(df['score'] >= 90),(df['score'] >= 60) & (df['score'] < 90),(df['score'] < 60)
]
choices = ['优秀', '及格', '不及格']df['grade'] = np.select(conditions, choices)print(df)

输出：

   score grade
0     85    及格
1     60    及格
2     45  不及格
3     90    优秀

4. 使用 `loc[]` 条件赋值

适用于 基于条件修改或创建列。

df = pd.DataFrame({'age': [25, 30, 18, 40]})# 生成新列（是否成年）
df.loc[df['age'] >= 18, 'is_adult'] = '是'
df.loc[df['age'] < 18, 'is_adult'] = '否'print(df)

输出：

   age is_adult
0   25        是
1   30        是
2   18        是
3   40        是

5. 使用 `assign()` 生成新列

适用于 链式操作，不会修改原 DataFrame。

df = pd.DataFrame({'price': [100, 200, 300]})# 生成新列（含税价格）
df = df.assign(taxed_price=lambda x: x['price'] * 1.1)print(df)

输出：

   price  taxed_price
0    100        110.0
1    200        220.0
2    300        330.0

6. 字符串处理（`str.方法()`）

适用于 文本列 的处理。

df = pd.DataFrame({'email': ['alice@example.com', 'bob@test.com']})# 提取域名
df['domain'] = df['email'].str.split('@').str[1]print(df)

输出：

             email       domain
0  alice@example.com  example.com
1     bob@test.com     test.com

总结

方法	适用场景	示例
`map()`	离散值映射	`df['新列'] = df['某列'].map({'A': 1, 'B': 2})`
`apply()`	复杂逻辑	`df['新列'] = df['某列'].apply(lambda x: x*2)`
`np.where()`	二分类	`df['新列'] = np.where(df['某列'] > 0, '正', '负')`
`np.select()`	多条件	`df['新列'] = np.select([cond1, cond2], ['A', 'B'])`
`loc[]`	条件赋值	`df.loc[df['某列'] > 0, '新列'] = '正'`
`assign()`	链式操作	`df = df.assign(新列=lambda x: x['某列']*2)`
`str.方法()`	字符串处理	`df['新列'] = df['某列'].str.upper()`