当前位置：首页 > news >正文

基于 Python 实现问卷数据分析的详细示例

news 2025/10/10 16:59:27

以下是一个基于 Python 实现问卷数据分析的详细示例，涵盖词云图、情感分析、描述性统计分析、聚类分析（K-Means）、回归分析（简单线性回归作为示例）等内容。

1. 安装必要的库

首先，确保你已经安装了以下必要的 Python 库：

pip install pandas numpy matplotlib seaborn wordcloud nltk scikit-learn statsmodels

2. 示例代码

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from wordcloud import WordCloud
import nltk
from nltk.sentiment.vader import SentimentIntensityAnalyzer
from sklearn.cluster import KMeans
from sklearn.preprocessing import StandardScaler
import statsmodels.api as sm

# 下载必要的 NLTK 数据
nltk.download('vader_lexicon')

# 假设我们有一个 CSV 文件，包含问卷数据
# 读取数据
data = pd.read_csv('survey_data.csv')

# 词云图
# 假设问卷中有一个文本列 'comments'
text = ' '.join(data['comments'].dropna())
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text)
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Word Cloud of Comments')
plt.show()

# 情感分析
sia = SentimentIntensityAnalyzer()
data['sentiment_score'] = data['comments'].apply(lambda x: sia.polarity_scores(x)['compound'] if isinstance(x, str) else np.nan)
plt.figure(figsize=(10, 5))
sns.histplot(data['sentiment_score'].dropna(), kde=True)
plt.title('Sentiment Score Distribution')
plt.xlabel('Sentiment Score')
plt.ylabel('Frequency')
plt.show()

# 描述性统计分析
# 假设问卷中有一些数值列，如 'age', 'income'
numeric_columns = ['age', 'income']
description = data[numeric_columns].describe()
print(description)

# 用好看的图展现问卷结果
# 箱线图
plt.figure(figsize=(10, 5))
sns.boxplot(data=data[numeric_columns])
plt.title('Box Plot of Numeric Variables')
plt.show()

# 聚类分析（K-Means）
# 假设我们使用 'age' 和 'income' 进行聚类
X = data[numeric_columns].dropna()
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
kmeans = KMeans(n_clusters=3, random_state=42)
data['cluster'] = kmeans.fit_predict(X_scaled)
plt.figure(figsize=(10, 5))
sns.scatterplot(data=data, x='age', y='income', hue='cluster', palette='viridis')
plt.title('K-Means Clustering of Consumers')
plt.xlabel('Age')
plt.ylabel('Income')
plt.show()

# 回归分析（简单线性回归）
# 假设我们想分析 'age' 对 'income' 的影响
X = data['age'].dropna()
y = data['income'].dropna()
X = sm.add_constant(X)
model = sm.OLS(y, X).fit()
print(model.summary())