当前位置：首页 > news >正文

营销活动效果分析与策略优化

news 2025/8/1 0:51:03

项目背景和目标

某抖音店铺举办了一次营销活动，并给部分用户推送了部分营销活动的通知，现在需要对这次活动的一个数据进行一个分析，目标问题如下：

这次营销活动（通知）有没有显著提升用户的消费意愿？
如果有提升，提升效果如何？如果活动基本没有效果，那么是哪部分出了问题？或者对哪一部分用户是有影响的？
这部分受影响的用户有哪些特征？
提出以下关于这次营销活动的优化策略或建议。

数据集描述（源于阿里天池平台）：

User_ID 每个用户的唯一标识符。
Age 用户的年龄。
Gender 用户的性别。
Location 用户所在地区：郊区、农村、城市。
Income 用户的收入水平。
Interests 用户的兴趣，如运动、时尚、技术等。
Last_Login_Days_Ago 用户上次登录以来的天数。
Purchase_Frequency 用户进行购买的频率。
Average_Order_Value 用户下单的平均价值。
Total_Spending 用户消费的总金额。
Product_Category_Preference 用户偏好的特定产品类别。
Time_Spent_on_Site_Minutes 用户在登录状态的时间。
Pages_Viewed 用户在访问期间浏览的页面数量。
Newsletter_Subscription 用户是否订阅了营销活动通知。

分析过程

一、数据导入与处理

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from scipy import statsplt.rcParams['font.sans-serif'] = ['SimHei']

User_ID 每个用户的唯一标识符。
Age 用户的年龄。
Gender 用户的性别。
Location 用户所在地区：郊区、农村、城市。
Income 用户的收入水平。
Interests 用户的兴趣，如运动、时尚、技术等。
Last_Login_Days_Ago 用户上次登录以来的天数。
Purchase_Frequency 用户进行购买的频率。
Average_Order_Value 用户下单的平均价值。
Total_Spending 用户消费的总金额。
Product_Category_Preference 用户偏好的特定产品类别。
Time_Spent_on_Site_Minutes 用户在登录状态的时间。
Pages_Viewed 用户在访问期间浏览的页面数量。
Newsletter_Subscription 用户是否订阅了营销活动通知。

# 数据导入
data = pd.read_csv('user_personalized_features.csv')
data.sample(10)

	Unnamed: 0.1	Unnamed: 0	User_ID	Age	Gender	Location	Income	Interests	Last_Login_Days_Ago	Purchase_Frequency	Average_Order_Value	Total_Spending	Product_Category_Preference	Time_Spent_on_Site_Minutes	Pages_Viewed	Newsletter_Subscription
831	831	831	#832	42	Female	Urban	5141	Food	13	15	123.733333	1856	Apparel	84	28	False
321	321	321	#322	51	Female	Suburban	47583	Sports	6	15	215.266667	3229	Electronics	294	12	False
659	659	659	#660	64	Male	Urban	78770	Food	20	2	675.500000	1351	Health & Beauty	47	39	True
157	157	157	#158	46	Female	Suburban	47763	Technology	3	2	1532.000000	3064	Health & Beauty	289	13	False
749	749	749	#750	27	Male	Rural	6296	Fashion	12	2	601.000000	1202	Apparel	564	26	True
813	813	813	#814	22	Male	Suburban	103381	Fashion	29	5	539.000000	2695	Books	512	5	False
860	860	860	#861	64	Male	Suburban	72786	Technology	29	2	2250.500000	4501	Health & Beauty	10	29	False
530	530	530	#531	62	Female	Urban	33101	Travel	24	6	550.000000	3300	Home & Kitchen	571	29	True
7	7	7	#8	36	Male	Urban	46005	Technology	26	2	549.000000	1098	Apparel	558	19	True
391	391	391	#392	31	Female	Urban	143002	Fashion	23	4	1139.000000	4556	Home & Kitchen	382	41	True

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 16 columns):#   Column                       Non-Null Count  Dtype  
---  ------                       --------------  -----  0   Unnamed: 0.1                 1000 non-null   int64  1   Unnamed: 0                   1000 non-null   int64  2   User_ID                      1000 non-null   object 3   Age                          1000 non-null   int64  4   Gender                       1000 non-null   object 5   Location                     1000 non-null   object 6   Income                       1000 non-null   int64  7   Interests                    1000 non-null   object 8   Last_Login_Days_Ago          1000 non-null   int64  9   Purchase_Frequency           1000 non-null   int64  10  Average_Order_Value          1000 non-null   float6411  Total_Spending               1000 non-null   int64  12  Product_Category_Preference  1000 non-null   object 13  Time_Spent_on_Site_Minutes   1000 non-null   int64  14  Pages_Viewed                 1000 non-null   int64  15  Newsletter_Subscription      1000 non-null   bool   
dtypes: bool(1), float64(1), int64(9), object(5)
memory usage: 118.3+ KB

data = data.iloc[:, 2:]
data.head()

	User_ID	Age	Gender	Location	Income	Interests	Last_Login_Days_Ago	Purchase_Frequency	Average_Order_Value	Total_Spending	Product_Category_Preference	Time_Spent_on_Site_Minutes	Pages_Viewed	Newsletter_Subscription
0	#1	56	Male	Suburban	66266	Sports	5	7	528.285714	3698	Books	584	38	True
1	#2	46	Female	Rural	30581	Technology	15	1	567.000000	567	Electronics	432	40	False
2	#3	32	Female	Suburban	109943	Sports	28	7	475.285714	3327	Apparel	306	1	True
3	#4	60	Female	Suburban	91369	Fashion	18	6	450.500000	2703	Apparel	527	29	False
4	#5	25	Male	Suburban	49255	Travel	2	19	145.105263	2757	Health & Beauty	53	10	True

# 检查缺失值
data.isnull().sum()

User_ID                        0
Age                            0
Gender                         0
Location                       0
Income                         0
Interests                      0
Last_Login_Days_Ago            0
Purchase_Frequency             0
Average_Order_Value            0
Total_Spending                 0
Product_Category_Preference    0
Time_Spent_on_Site_Minutes     0
Pages_Viewed                   0
Newsletter_Subscription        0
dtype: int64

data['Newsletter_Subscription'].value_counts()

Newsletter_Subscription
True     507
False    493
Name: count, dtype: int64

二、显著性检验

# 订阅营销活动与未订阅的用户进行对比（总体）
subscribed = data[data['Newsletter_Subscription'] == True]
not_subscribed = data[data['Newsletter_Subscription'] == False]# 计算两个组的消费总额平均值
subscribed_spending = subscribed['Total_Spending'].mean()
not_subscribed_spending = not_subscribed['Total_Spending'].mean()print(f"订阅营销活动的用户平均消费总额: {subscribed_spending}")
print(f"未订阅营销活动的用户平均消费总额: {not_subscribed_spending}")# t 检验
t_spending, p_spending = stats.ttest_ind(subscribed['Total_Spending'], not_subscribed['Total_Spending'])
print(f"t 检验结果: p = {p_spending:.4f}")

订阅营销活动的用户平均消费总额: 2376.8579881656806
未订阅营销活动的用户平均消费总额: 2499.125760649087
t 检验结果: p = 0.2165

# 2.1 按年龄段进行统计检验
data['Age'].describe()

count    1000.000000
mean       40.986000
std        13.497852
min        18.000000
25%        29.000000
50%        42.000000
75%        52.000000
max        64.000000
Name: Age, dtype: float64

bins = [18, 25, 35, 45, 55, 64]
labels = ['18-25岁', '26-35岁', '36-45岁', '46-55岁', '56-64岁']
data['Age_Group'] = pd.cut(data['Age'], bins=bins, labels=labels, right=True)
# T-检验
for age, sub_data in data.groupby('Age_Group', observed=False):subs = sub_data[sub_data['Newsletter_Subscription']==1]['Total_Spending']nonsubs = sub_data[sub_data['Newsletter_Subscription']==0]['Total_Spending']if len(subs) > 30 and len(nonsubs) > 30:  # 确保样本数足够t_stat, p_value = stats.ttest_ind(subs, nonsubs, equal_var=False)lift = (subs.mean() - nonsubs.mean()) / nonsubs.mean()print(f"年龄段: {age}")print(f"  订阅用户均值: {subs.mean():.2f}, 未订阅用户均值: {nonsubs.mean():.2f}, 提升率: {lift:.2%}, p-value = {p_value:.4f}")

年龄段: 18-25岁订阅用户均值: 2375.65, 未订阅用户均值: 2434.55, 提升率: -2.42%, p-value = 0.8272
年龄段: 26-35岁订阅用户均值: 2372.65, 未订阅用户均值: 2317.06, 提升率: 2.40%, p-value = 0.7936
年龄段: 36-45岁订阅用户均值: 2244.44, 未订阅用户均值: 2542.35, 提升率: -11.72%, p-value = 0.1422
年龄段: 46-55岁订阅用户均值: 2499.38, 未订阅用户均值: 2417.64, 提升率: 3.38%, p-value = 0.6666
年龄段: 56-64岁订阅用户均值: 2312.07, 未订阅用户均值: 2822.92, 提升率: -18.10%, p-value = 0.0435

可以发现，对56-64岁的用户，活动通知推送影响显著（p值0.0435）。而提升率为负值，高达-18%，则说明营销活动通知的推送可能带来一些负面影响（如频繁推送通知可能导致用户体验变差）。

# 2.2 按性别进行统计检验
for gender, sub_data in data.groupby('Gender'):subs = sub_data[sub_data['Newsletter_Subscription']==1]['Total_Spending']nonsubs = sub_data[sub_data['Newsletter_Subscription']==0]['Total_Spending']if len(subs) > 30 and len(nonsubs) > 30:  # 确保样本数足够 t_stat, p_value = stats.ttest_ind(subs, nonsubs, equal_var=False)lift = (subs.mean() - nonsubs.mean()) / nonsubs.mean()print(f"用户偏好产品: {gender}")print(f"  订阅用户均值: {subs.mean():.2f}, 未订阅用户均值: {nonsubs.mean():.2f}, 提升率: {lift:.2%}, p-value = {p_value:.4f}")

用户偏好产品: Female订阅用户均值: 2299.85, 未订阅用户均值: 2497.30, 提升率: -7.91%, p-value = 0.1674
用户偏好产品: Male订阅用户均值: 2446.63, 未订阅用户均值: 2500.77, 提升率: -2.16%, p-value = 0.6940

性别检验，无显著影响，可能是偶然情况。

# 2.3 按地区进行统计检验
for location, sub_data in data.groupby('Location'):subs = sub_data[sub_data['Newsletter_Subscription']==1]['Total_Spending']nonsubs = sub_data[sub_data['Newsletter_Subscription']==0]['Total_Spending']if len(subs) > 30 and len(nonsubs) > 30:  # 确保样本数足够t_stat, p_value = stats.ttest_ind(subs, nonsubs, equal_var=False)lift = (subs.mean() - nonsubs.mean()) / nonsubs.mean()print(f"地区: {location}")print(f"  订阅用户均值: {subs.mean():.2f}, 未订阅用户均值: {nonsubs.mean():.2f}, 提升率: {lift:.2%}, p-value = {p_value:.4f}")

地区: Rural订阅用户均值: 1180.18, 未订阅用户均值: 1308.11, 提升率: -9.78%, p-value = 0.1445
地区: Suburban订阅用户均值: 2965.03, 未订阅用户均值: 3024.80, 提升率: -1.98%, p-value = 0.4337
地区: Urban订阅用户均值: 2868.04, 未订阅用户均值: 3005.75, 提升率: -4.58%, p-value = 0.5348

可以发现，p值均大于0.05，说明营销活动通知的推送对不同地区的用户消费水平并没有显著影响，有可能是偶然情况导致。

# 2.4 按用户收入进行统计检验
labels = ['Low', 'Medium', 'High']
data['Income_Group'] = pd.qcut(data['Income'], q=3, labels=labels)
for income, sub_data in data.groupby('Income_Group', observed=False):subs = sub_data[sub_data['Newsletter_Subscription']==1]['Total_Spending']nonsubs = sub_data[sub_data['Newsletter_Subscription']==0]['Total_Spending']if len(subs) > 30 and len(nonsubs) > 30:  # 确保样本数足够t_stat, p_value = stats.ttest_ind(subs, nonsubs, equal_var=False)lift = (subs.mean() - nonsubs.mean()) / nonsubs.mean()print(f"收入类别: {income}")print(f"  订阅用户均值: {subs.mean():.2f}, 未订阅用户均值: {nonsubs.mean():.2f}, 提升率: {lift:.2%}, p-value = {p_value:.4f}")

收入类别: Low订阅用户均值: 1508.66, 未订阅用户均值: 1847.53, 提升率: -18.34%, p-value = 0.0269
收入类别: Medium订阅用户均值: 2700.47, 未订阅用户均值: 2638.92, 提升率: 2.33%, p-value = 0.7119
收入类别: High订阅用户均值: 2901.48, 未订阅用户均值: 3024.67, 提升率: -4.07%, p-value = 0.4476

可以发现，对收入类别为低收入的用户，活动通知推送影响显著（p值0.0269）。而提升率为负值，高达-18%，则说明营销活动通知的推送可能带来一些负面影响（如频繁推送通知可能导致用户体验变差）。

# 2.5 按兴趣进行统计检验
for interest, sub_data in data.groupby('Interests'):subs = sub_data[sub_data['Newsletter_Subscription']==1]['Total_Spending']nonsubs = sub_data[sub_data['Newsletter_Subscription']==0]['Total_Spending']if len(subs) > 30 and len(nonsubs) > 30:  # 确保样本数足够t_stat, p_value = stats.ttest_ind(subs, nonsubs, equal_var=False)lift = (subs.mean() - nonsubs.mean()) / nonsubs.mean()print(f"兴趣: {interest}")print(f"  订阅用户均值: {subs.mean():.2f}, 未订阅用户均值: {nonsubs.mean():.2f}, 提升率: {lift:.2%}, p-value = {p_value:.4f}")

兴趣: Fashion订阅用户均值: 1932.10, 未订阅用户均值: 2153.94, 提升率: -10.30%, p-value = 0.1991
兴趣: Food订阅用户均值: 2272.32, 未订阅用户均值: 2368.25, 提升率: -4.05%, p-value = 0.5836
兴趣: Sports订阅用户均值: 2429.21, 未订阅用户均值: 2537.51, 提升率: -4.27%, p-value = 0.5407
兴趣: Technology订阅用户均值: 3476.30, 未订阅用户均值: 3669.61, 提升率: -5.27%, p-value = 0.5607
兴趣: Travel订阅用户均值: 1981.37, 未订阅用户均值: 1678.49, 提升率: 18.04%, p-value = 0.0305

可以发现，用户兴趣为Travel时，p值小于0.05，说明营销活动通知的推送对该类用户有显著影响，而且对用户消费的提升率有明显效果，高达18%。

三、用户画像分析

travel_users = data[data['Interests'] == 'Travel']
travel_users_T = travel_users[travel_users['Newsletter_Subscription'] == True]print('兴趣为Travel的用户画像：')
plt.figure(figsize=(12, 10))
# 性别分布
plt.subplot(3, 2, 1)
gender_counts = travel_users_T['Gender'].value_counts().sort_values()
plt.pie(gender_counts, labels=gender_counts.index, autopct='%1.1f%%')
plt.title('性别分布')# 地区分布
plt.subplot(3, 2, 2)
location_counts = travel_users_T['Location'].value_counts().sort_values()
plt.pie(location_counts, labels=location_counts.index, autopct='%1.1f%%')
plt.title('地区分布')# 收入分布
plt.subplot(3, 2, 3)
income_counts = travel_users_T['Income_Group'].value_counts().sort_values()
plt.pie(income_counts, labels=income_counts.index, autopct='%1.1f%%')
plt.title('收入分布')# 年龄分布
plt.subplot(3, 2, 4)
age_counts = travel_users_T['Age_Group'].value_counts()
plt.pie(age_counts, labels=age_counts.index, autopct='%1.1f%%')
plt.title('年龄分布')# 产品类别偏好分布
plt.subplot(3, 2, 5)
pcp = travel_users_T['Product_Category_Preference'].value_counts().sort_values()
plt.pie(pcp, labels=pcp.index, autopct='%1.1f%%')
plt.title('产品类别偏好分布')plt.tight_layout()
plt.show()

兴趣为Travel的用户画像：

在这里插入图片描述

age_users = data[data['Age_Group'] == '56-64岁']
age_users_T = age_users[age_users['Newsletter_Subscription'] == True]print('56-64岁用户的用户画像：')
plt.figure(figsize=(12, 10))
# 性别分布
plt.subplot(3, 2, 1)
gender_counts = age_users_T['Gender'].value_counts().sort_values()
plt.pie(gender_counts, labels=gender_counts.index, autopct='%1.1f%%')
plt.title('性别分布')# 地区分布
plt.subplot(3, 2, 2)
location_counts = age_users_T['Location'].value_counts().sort_values()
plt.pie(location_counts, labels=location_counts.index, autopct='%1.1f%%')
plt.title('地区分布')# 收入分布
plt.subplot(3, 2, 3)
income_counts = age_users_T['Income_Group'].value_counts().sort_values()
plt.pie(income_counts, labels=income_counts.index, autopct='%1.1f%%')
plt.title('收入分布')# 兴趣分布
plt.subplot(3, 2, 4)
age_counts = age_users_T['Interests'].value_counts()
plt.pie(age_counts, labels=age_counts.index, autopct='%1.1f%%')
plt.title('兴趣分布')# 产品类别偏好分布
plt.subplot(3, 2, 5)
pcp = age_users_T['Product_Category_Preference'].value_counts().sort_values()
plt.pie(pcp, labels=pcp.index, autopct='%1.1f%%')
plt.title('产品类别偏好分布')plt.tight_layout()
plt.show()

56-64岁用户的用户画像：

在这里插入图片描述

low_income_users = data[data['Income_Group'] == 'Low']
low_income_users_T = low_income_users[low_income_users['Newsletter_Subscription'] == True]print('低收入用户的用户画像：')
plt.figure(figsize=(12, 10))
# 性别分布
plt.subplot(3, 2, 1)
gender_counts = low_income_users_T['Gender'].value_counts().sort_values()
plt.pie(gender_counts, labels=gender_counts.index, autopct='%1.1f%%')
plt.title('性别分布')# 地区分布
plt.subplot(3, 2, 2)
location_counts = low_income_users_T['Location'].value_counts().sort_values()
plt.pie(location_counts, labels=location_counts.index, autopct='%1.1f%%')
plt.title('地区分布')# 兴趣分布
plt.subplot(3, 2, 3)
income_counts = low_income_users_T['Interests'].value_counts().sort_values()
plt.pie(income_counts, labels=income_counts.index, autopct='%1.1f%%')
plt.title('兴趣分布')# 年龄分布
plt.subplot(3, 2, 4)
age_counts = low_income_users_T['Age_Group'].value_counts()
plt.pie(age_counts, labels=age_counts.index, autopct='%1.1f%%')
plt.title('年龄分布')# 产品类别偏好分布
plt.subplot(3, 2, 5)
pcp = low_income_users_T['Product_Category_Preference'].value_counts().sort_values()
plt.pie(pcp, labels=pcp.index, autopct='%1.1f%%')
plt.title('产品类别偏好分布')plt.tight_layout()
plt.show()

低收入用户的用户画像：

在这里插入图片描述

结论

1、从总体上看，这次营销活动通知的推送并没有显著提升用户的消费意愿（在统计学上没有显著性差异）。
2、深入分析后发现，活动消息的推送对兴趣爱好为traval（旅游）的用户影响效果是显著的，与没有订阅通知的用户相比，用户消费金额提升了18%；
此外，通知的推送对低收入用户以及56-64岁用户这两部分用户影响也是显著的，但是，带来的是负面影响，用户消费金额的提升率为负值（在-18%左右）。
3、兴趣爱好为traval（旅游）的用户：大部分用户在26-55岁之间（近70%），城市和郊区居多，男性偏多，偏好电子产品、服装、家庭日用；
56-64岁用户：男性偏多，喜欢时尚、美食、旅游偏多（70%左右），产品偏好保健；
低收入用户：农村地区偏多（80%左右），36岁以上占比50%，多数对运动、旅游、时尚感兴趣。