50.情感分析:AI读懂你的心情
情感分析:AI读懂你的心情
🎯 前言:当AI也会察言观色
想象一下,你的朋友发了一条朋友圈:“今天天气真好呢…”,你能瞬间听出这话里的酸味吗?🍋 或者看到一条商品评论"这个产品还行吧",你能感受到那种欲言又止的失望吗?
人类天生就是情感侦探,我们能从只言片语中读出弦外之音,从表情变化中洞察内心波澜。而现在,我们要教会AI这门"读心术"!
情感分析(Sentiment Analysis),就像给AI装上了一副"心情眼镜"👓,让它能够理解文本背后的情感色彩——是开心😊、难过😢、愤怒😠,还是无所谓😐?
这项技术就像是数字世界的"情商测试",它能帮我们:
- 📱 社交媒体监控:了解用户对产品的真实感受
- 🛒 电商评论分析:快速识别好评差评的情感倾向
- 📰 新闻情感分析:分析舆论走向和公众情绪
- 🎬 影评分析:预测电影的市场反响
- 📊 股市情绪分析:从新闻和社交媒体预测市场走向
今天我们就来一起探索这个神奇的领域,让AI也能成为一个善解人意的"暖男"!
📚 目录
- 什么是情感分析?
- 情感分析的技术原理
- Python情感分析工具箱
- 基础情感分析实战
- 进阶:自定义情感分析模型
- 实际应用案例
- 多语言情感分析
- 情感分析的挑战与解决方案
- 行业应用实例
- 未来发展趋势
🧠 什么是情感分析?
情感分析的定义
情感分析,也叫做观点挖掘(Opinion Mining),是自然语言处理的一个重要分支。它就像是一个"情感翻译器",能够:
- 识别情感极性:判断文本是正面、负面还是中性
- 检测情感强度:分析情感的强烈程度
- 提取情感目标:找出情感针对的具体对象
- 分析情感类型:区分喜悦、愤怒、恐惧、惊讶等具体情感
生活中的情感分析例子
让我们看几个有趣的例子:
# 情感分析示例
texts = ["这部电影真是太棒了!演员演技超赞,剧情引人入胜!", # 极度正面"这个餐厅的服务还可以,但是菜品一般般。", # 中性偏负面"什么破产品,完全就是垃圾,浪费我的时间!", # 极度负面"今天天气不错,心情也很好。", # 正面"会议推迟了,不过也没什么大不了的。", # 中性
]# 人工标注的情感标签
labels = ["正面", "负面", "负面", "正面", "中性"]
情感分析的层次
情感分析可以分为不同的层次,就像剥洋葱一样🧅:
1. 文档级情感分析
分析整篇文档的总体情感倾向:
review = """
这家餐厅的环境很不错,装修很有品味。
服务员态度也很好,上菜速度很快。
但是菜品的味道实在是让人失望,价格也偏贵。
总体来说,不会再来第二次了。
"""
# 整体情感:负面(虽然有正面描述,但结论是负面的)
2. 句子级情感分析
逐句分析情感:
sentences = ["这家餐厅的环境很不错,装修很有品味。", # 正面"服务员态度也很好,上菜速度很快。", # 正面"但是菜品的味道实在是让人失望,价格也偏贵。", # 负面"总体来说,不会再来第二次了。", # 负面
]
3. 方面级情感分析
针对特定方面的情感:
aspects = {"环境": "正面","服务": "正面", "菜品": "负面","价格": "负面","整体": "负面"
}
🔍 情感分析的技术原理
1. 基于词典的方法
就像查字典一样,预先准备一个情感词典:
# 简单的情感词典
positive_words = ["好", "棒", "优秀", "喜欢", "满意", "推荐", "完美", "惊艳", "超赞", "给力", "不错"
]negative_words = ["差", "烂", "糟糕", "讨厌", "失望", "垃圾","坑爹", "破", "坏", "恶心", "后悔"
]def simple_sentiment_analysis(text):"""基于词典的简单情感分析"""positive_count = sum(1 for word in positive_words if word in text)negative_count = sum(1 for word in negative_words if word in text)if positive_count > negative_count:return "正面"elif negative_count > positive_count:return "负面"else:return "中性"# 测试
test_text = "这个产品真的很不错,我很满意,强烈推荐!"
print(f"情感分析结果: {simple_sentiment_analysis(test_text)}")
2. 基于机器学习的方法
训练模型来学习情感模式:
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.naive_bayes import MultinomialNB
from sklearn.pipeline import Pipeline# 创建机器学习管道
sentiment_pipeline = Pipeline([('tfidf', TfidfVectorizer(max_features=5000)),('classifier', MultinomialNB())
])# 训练数据示例
train_texts = ["这个产品质量很好,我很满意","服务态度差,不推荐","价格合理,性价比不错","完全是垃圾,浪费钱","还可以,没什么特别的"
]train_labels = ["正面", "负面", "正面", "负面", "中性"]# 训练模型
sentiment_pipeline.fit(train_texts, train_labels)# 预测
test_text = "这个东西真的很棒,超出预期!"
prediction = sentiment_pipeline.predict([test_text])[0]
print(f"预测结果: {prediction}")
3. 基于深度学习的方法
使用神经网络捕捉复杂的情感模式:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropoutdef create_sentiment_model(vocab_size, embedding_dim=100, max_length=100):"""创建LSTM情感分析模型"""model = Sequential([Embedding(vocab_size, embedding_dim, input_length=max_length),LSTM(128, dropout=0.2, recurrent_dropout=0.2),Dense(64, activation='relu'),Dropout(0.5),Dense(3, activation='softmax') # 3类:正面、负面、中性])model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])return model# 模型架构展示
model = create_sentiment_model(vocab_size=10000)
print(model.summary())
🧰 Python情感分析工具箱
1. TextBlob - 新手的情感分析利器
TextBlob是Python中最简单易用的情感分析库,就像情感分析界的"傻瓜相机"📷:
from textblob import TextBlobdef analyze_sentiment_textblob(text):"""使用TextBlob进行情感分析"""blob = TextBlob(text)# 获取情感极性 (-1到1,-1最负面,1最正面)polarity = blob.sentiment.polarity# 获取主观性 (0到1,0最客观,1最主观)subjectivity = blob.sentiment.subjectivity# 判断情感类别if polarity > 0.1:sentiment = "正面"elif polarity < -0.1:sentiment = "负面"else:sentiment = "中性"return {"文本": text,"情感": sentiment,"极性分数": round(polarity, 2),"主观性": round(subjectivity, 2)}# 批量分析示例
texts = ["I love this product! It's amazing!","This is terrible. I hate it.","The weather is okay today.","This movie is absolutely fantastic!","I'm disappointed with the service."
]print("TextBlob情感分析结果:")
print("-" * 50)
for text in texts:result = analyze_sentiment_textblob(text)print(f"文本: {result['文本']}")print(f"情感: {result['情感']} (极性: {result['极性分数']}, 主观性: {result['主观性']})")print()
2. VADER - 社交媒体情感分析专家
VADER (Valence Aware Dictionary and sEntiment Reasoner) 特别适合分析社交媒体文本:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzerdef analyze_sentiment_vader(text):"""使用VADER进行情感分析"""analyzer = SentimentIntensityAnalyzer()scores = analyzer.polarity_scores(text)# 获取综合分数compound = scores['compound']# 判断情感类别if compound >= 0.05:sentiment = "正面"elif compound <= -0.05:sentiment = "负面"else:sentiment = "中性"return {"文本": text,"情感": sentiment,"综合分数": round(compound, 2),"正面": round(scores['pos'], 2),"中性": round(scores['neu'], 2),"负面": round(scores['neg'], 2)}# 社交媒体文本示例
social_texts = ["OMG!!! This is AMAZING!!! 😍😍😍","meh... it's okay I guess 😐","WORST PRODUCT EVER!!! 😡😡😡","Pretty good, not bad 👍","I'm SO disappointed 😢"
]print("VADER情感分析结果:")
print("-" * 50)
for text in social_texts:result = analyze_sentiment_vader(text)print(f"文本: {result['文本']}")print(f"情感: {result['情感']} (综合: {result['综合分数']})")print(f"详细分数 - 正面: {result['正面']}, 中性: {result['中性']}, 负面: {result['负面']}")print()
3. 中文情感分析工具 - SnowNLP
专门针对中文文本的情感分析:
from snownlp import SnowNLPdef analyze_sentiment_chinese(text):"""使用SnowNLP进行中文情感分析"""s = SnowNLP(text)# 获取情感分数 (0到1,越接近1越正面)sentiment_score = s.sentiments# 判断情感类别if sentiment_score > 0.6:sentiment = "正面"elif sentiment_score < 0.4:sentiment = "负面"else:sentiment = "中性"return {"文本": text,"情感": sentiment,"情感分数": round(sentiment_score, 2),"关键词": s.keywords(3), # 提取前3个关键词"摘要": s.summary(1)[0] if s.summary(1) else "" # 生成摘要}# 中文文本示例
chinese_texts = ["这个产品真的太好了,我超级喜欢!","质量一般般,价格也不便宜。","完全就是垃圾,浪费我的钱!","还可以吧,没什么特别的。","服务态度很好,但是产品有点问题。"
]print("SnowNLP中文情感分析结果:")
print("-" * 50)
for text in chinese_texts:result = analyze_sentiment_chinese(text)print(f"文本: {result['文本']}")print(f"情感: {result['情感']} (分数: {result['情感分数']})")print(f"关键词: {result['关键词']}")print()
4. Transformers - 最先进的预训练模型
使用Hugging Face的预训练模型:
from transformers import pipelinedef analyze_sentiment_transformers(text):"""使用Transformers预训练模型进行情感分析"""# 创建情感分析管道sentiment_pipeline = pipeline("sentiment-analysis",model="cardiffnlp/twitter-roberta-base-sentiment-latest",return_all_scores=True)# 分析情感results = sentiment_pipeline(text)[0]# 处理结果sentiment_map = {"LABEL_0": "负面","LABEL_1": "中性", "LABEL_2": "正面"}processed_results = []for result in results:processed_results.append({"情感": sentiment_map.get(result['label'], result['label']),"置信度": round(result['score'], 3)})# 找出最可能的情感best_result = max(processed_results, key=lambda x: x['置信度'])return {"文本": text,"预测情感": best_result['情感'],"置信度": best_result['置信度'],"所有结果": processed_results}# 使用示例
transformer_texts = ["I absolutely love this new phone!","The movie was okay, nothing special.","This is the worst experience ever!","Great product, highly recommended!","Not sure if I like it or not."
]print("Transformers情感分析结果:")
print("-" * 50)
for text in transformer_texts:result = analyze_sentiment_transformers(text)print(f"文本: {result['文本']}")print(f"预测情感: {result['预测情感']} (置信度: {result['置信度']})")print(f"详细结果: {result['所有结果']}")print()
💻 基础情感分析实战
项目1:电商评论情感分析系统
让我们创建一个完整的电商评论情感分析系统,就像给店铺装上"情感雷达"📡:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from collections import Counter
import re
from datetime import datetimeclass ECommerceReviewAnalyzer:"""电商评论情感分析器"""def __init__(self):self.reviews = []self.analysis_results = []def add_review(self, review_text, rating=None, product_id=None, user_id=None):"""添加评论"""review = {'text': review_text,'rating': rating,'product_id': product_id,'user_id': user_id,'timestamp': datetime.now()}self.reviews.append(review)def analyze_reviews(self):"""批量分析评论情感"""from textblob import TextBlobself.analysis_results = []for i, review in enumerate(self.reviews):# 使用TextBlob分析情感blob = TextBlob(review['text'])polarity = blob.sentiment.polaritysubjectivity = blob.sentiment.subjectivity# 确定情感类别if polarity > 0.1:sentiment = "正面"sentiment_score = "好评"elif polarity < -0.1:sentiment = "负面"sentiment_score = "差评"else:sentiment = "中性"sentiment_score = "中评"# 分析评论长度和关键词word_count = len(review['text'].split())result = {'review_id': i,'text': review['text'],'sentiment': sentiment,'sentiment_score': sentiment_score,'polarity': round(polarity, 3),'subjectivity': round(subjectivity, 3),'word_count': word_count,'rating': review['rating'],'product_id': review['product_id'],'user_id': review['user_id'],'timestamp': review['timestamp']}self.analysis_results.append(result)def get_summary_statistics(self):"""获取情感分析统计摘要"""if not self.analysis_results:return "请先运行analyze_reviews()方法"df = pd.DataFrame(self.analysis_results)# 情感分布sentiment_counts = df['sentiment'].value_counts()# 极性分数统计polarity_stats = df['polarity'].describe()# 主观性统计subjectivity_stats = df['subjectivity'].describe()summary = {"总评论数": len(self.analysis_results),"情感分布": sentiment_counts.to_dict(),"极性分数统计": polarity_stats.to_dict(),"主观性统计": subjectivity_stats.to_dict(),"平均评论长度": df['word_count'].mean(),"最正面评论": df.loc[df['polarity'].idxmax(), 'text'],"最负面评论": df.loc[df['polarity'].idxmin(), 'text']};return summarydef visualize_sentiment_distribution(self):"""可视化情感分布"""if not self.analysis_results:print("请先运行analyze_reviews()方法")returndf = pd.DataFrame(self.analysis_results)# 创建子图fig, axes = plt.subplots(2, 2, figsize=(15, 10))# 情感分布饼图sentiment_counts = df['sentiment'].value_counts()axes[0, 0].pie(sentiment_counts.values, labels=sentiment_counts.index, autopct='%1.1f%%')axes[0, 0].set_title('情感分布')# 极性分数直方图axes[0, 1].hist(df['polarity'], bins=20, alpha=0.7, color='skyblue')axes[0, 1].set_title('极性分数分布')axes[0, 1].set_xlabel('极性分数')axes[0, 1].set_ylabel('频次')# 主观性分数直方图axes[1, 0].hist(df['subjectivity'], bins=20, alpha=0.7, color='lightgreen')axes[1, 0].set_title('主观性分数分布')axes[1, 0].set_xlabel('主观性分数')axes[1, 0].set_ylabel('频次')# 情感与评分关系if 'rating' in df.columns and df['rating'].notna().any():sentiment_rating = df.groupby('sentiment')['rating'].mean()axes[1, 1].bar(sentiment_rating.index, sentiment_rating.values)axes[1, 1].set_title('情感与评分关系')axes[1, 1].set_ylabel('平均评分')else:axes[1, 1].text(0.5, 0.5, '无评分数据', ha='center', va='center')plt.tight_layout()plt.show()def get_top_keywords(self, sentiment_filter=None, top_n=10):"""获取高频关键词"""if not self.analysis_results:return "请先运行analyze_reviews()方法"df = pd.DataFrame(self.analysis_results)if sentiment_filter:df = df[df['sentiment'] == sentiment_filter]# 简单的关键词提取(去除停用词)stop_words = {'的', '了', '是', '我', '你', '他', '她', '它', '这', '那', '有', '在', '和', '与', '或', '但', '很', '非常', '比较', '还', '就', '都', '也', '只', '可以', '能够', '应该', '不过', '不是', '没有', '这个', '那个', '这些', '那些', '什么', '怎么', '为什么', 'the', 'a', 'an', 'and', 'or', 'but', 'in', 'on', 'at', 'to', 'for', 'of', 'with', 'by', 'is', 'are', 'was', 'were', 'be', 'been', 'have', 'has', 'had', 'do', 'does', 'did', 'will', 'would', 'could', 'should', 'may', 'might', 'can', 'this', 'that', 'these', 'those', 'i', 'you', 'he', 'she', 'it', 'we', 'they', 'me', 'him', 'her', 'us', 'them'}all_words = []for text in df['text']:# 简单的词汇提取words = re.findall(r'\b\w+\b', text.lower())words = [word for word in words if word not in stop_words and len(word) > 1]all_words.extend(words)# 统计词频word_counts = Counter(all_words)top_words = word_counts.most_common(top_n)return {f"{'所有' if not sentiment_filter else sentiment_filter}评论关键词": top_words}# 使用示例
analyzer = ECommerceReviewAnalyzer()# 添加样本评论
sample_reviews = [("这个产品真的很棒!质量超好,物流也很快,强烈推荐!", 5, "P001", "U001"),("还可以吧,没什么特别的,价格有点贵。", 3, "P001", "U002"),("完全就是垃圾!质量差得不行,浪费钱!", 1, "P001", "U003"),("Very good product, I love it!", 5, "P002", "U004"),("Not bad, but could be better.", 3, "P002", "U005"),("Terrible experience, would not recommend.", 1, "P002", "U006"),("服务态度很好,但是产品有一些小问题。", 3, "P003", "U007"),("Amazing quality! Exactly what I was looking for!", 5, "P003", "U008"),("产品不错,但是包装有点简陋。", 4, "P003", "U009"),("This is the best purchase I've made this year!", 5, "P004", "U010")
]for review, rating, product_id, user_id in sample_reviews:analyzer.add_review(review, rating, product_id, user_id)# 分析情感
analyzer.analyze_reviews()# 获取统计摘要
summary = analyzer.get_summary_statistics()
print("情感分析统计摘要:")
print("=" * 50)
for key, value in summary.items():print(f"{key}: {value}")print()# 获取关键词
keywords = analyzer.get_top_keywords(top_n=10)
print("\n关键词分析:")
print("=" * 50)
for category, words in keywords.items():print(f"{category}:")for word, count in words:print(f" {word}: {count}")print()# 分类别获取关键词
positive_keywords = analyzer.get_top_keywords(sentiment_filter="正面", top_n=5)
negative_keywords = analyzer.get_top_keywords(sentiment_filter="负面", top_n=5)print("正面评论关键词:")
print(positive_keywords)
print("\n负面评论关键词:")
print(negative_keywords)
项目2:社交媒体情感监控系统
创建一个实时监控社交媒体情感的系统:
import json
import time
from datetime import datetime, timedelta
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
from collections import defaultdict, dequeclass SocialMediaSentimentMonitor:"""社交媒体情感监控系统"""def __init__(self, max_history=1000):self.sentiment_history = deque(maxlen=max_history)self.keyword_sentiments = defaultdict(list)self.alert_threshold = {'positive': 0.7, 'negative': -0.7}def process_post(self, post_text, platform="twitter", user_id=None, keywords=None):"""处理单条社交媒体帖子"""from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzeranalyzer = SentimentIntensityAnalyzer()scores = analyzer.polarity_scores(post_text)# 创建记录record = {'timestamp': datetime.now(),'platform': platform,'user_id': user_id,'text': post_text,'sentiment_scores': scores,'compound_score': scores['compound'],'keywords': keywords or []}# 添加到历史记录self.sentiment_history.append(record)# 如果有关键词,记录关键词情感if keywords:for keyword in keywords:self.keyword_sentiments[keyword].append({'timestamp': record['timestamp'],'compound_score': scores['compound'],'text': post_text})# 检查是否需要触发警报self._check_alerts(record)return recorddef _check_alerts(self, record):"""检查是否需要触发情感警报"""compound = record['compound_score']if compound >= 0.05:sentiment = "正面"elif compound <= -0.05:sentiment = "负面"else:sentiment = "中性"# 情感过于负面if compound <= self.alert_threshold['negative']:alert = {'timestamp': record['timestamp'],'post_text': record['text'],'sentiment': sentiment,'compound_score': compound,'alert_type': '负面情感警报','recommendation': '请立即查看该帖子,必要时采取措施。'}self.sentiment_alerts.append(alert)print(f"⚠️ 负面情感警报!分数: {compound:.3f}")print(f"内容: {record['text'][:100]}...")print(f"建议: {alert['recommendation']}")print("-" * 50)# 情感过于正面elif compound >= self.alert_threshold['positive']:alert = {'timestamp': record['timestamp'],'post_text': record['text'],'sentiment': sentiment,'compound_score': compound,'alert_type': '正面情感警报','recommendation': '该帖子可能是虚假好评,请核实。'}self.sentiment_alerts.append(alert)print(f"🎉 正面情感警报!分数: {compound:.3f}")print(f"内容: {record['text'][:100]}...")print(f"建议: {alert['recommendation']}")print("-" * 50)def get_sentiment_trend(self, hours=24):"""获取指定时间内的情感趋势"""cutoff_time = datetime.now() - timedelta(hours=hours)recent_records = [record for record in self.sentiment_history if record['timestamp'] >= cutoff_time]if not recent_records:return {"message": "没有足够的数据"}# 按小时分组hourly_sentiment = defaultdict(list)for record in recent_records:hour_key = record['timestamp'].strftime("%Y-%m-%d %H:00")hourly_sentiment[hour_key].append(record['compound_score'])# 计算每小时平均情感trend_data = {}for hour, scores in hourly_sentiment.items():trend_data[hour] = {'average_sentiment': sum(scores) / len(scores),'post_count': len(scores),'max_sentiment': max(scores),'min_sentiment': min(scores)}return trend_datadef visualize_sentiment_trend(self, hours=24):"""可视化情感趋势"""trend_data = self.get_sentiment_trend(hours)if "message" in trend_data:print(trend_data["message"])return# 准备数据timestamps = []avg_sentiments = []post_counts = []for hour_str, data in sorted(trend_data.items()):timestamps.append(datetime.strptime(hour_str, "%Y-%m-%d %H:%M"))avg_sentiments.append(data['average_sentiment'])post_counts.append(data['post_count'])# 创建图表fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(12, 8))# 情感趋势图ax1.plot(timestamps, avg_sentiments, marker='o', linewidth=2, markersize=6)ax1.axhline(y=0, color='gray', linestyle='--', alpha=0.7)ax1.set_title('情感趋势 (过去24小时)', fontsize=14, fontweight='bold')ax1.set_ylabel('平均情感分数')ax1.grid(True, alpha=0.3)# 帖子数量图ax2.bar(timestamps, post_counts, alpha=0.7, color='lightblue')ax2.set_title('帖子数量分布', fontsize=14, fontweight='bold')ax2.set_ylabel('帖子数量')ax2.set_xlabel('时间')# 格式化时间轴for ax in [ax1, ax2]:ax.xaxis.set_major_formatter(mdates.DateFormatter('%H:%M'))ax.xaxis.set_major_locator(mdates.HourLocator(interval=2))plt.setp(ax.xaxis.get_majorticklabels(), rotation=45)plt.tight_layout()plt.show()def get_keyword_sentiment_report(self, keyword, days=7):"""获取特定关键词的情感报告"""if keyword not in self.keyword_sentiments:return f"关键词 '{keyword}' 暂无数据"cutoff_time = datetime.now() - timedelta(days=days)keyword_data = [record for record in self.keyword_sentiments[keyword]if record['timestamp'] >= cutoff_time]if not keyword_data:return f"关键词 '{keyword}' 在过去{days}天内暂无数据"# 计算统计信息scores = [record['compound_score'] for record in keyword_data]report = {'keyword': keyword,'total_mentions': len(keyword_data),'average_sentiment': sum(scores) / len(scores),'most_positive': max(keyword_data, key=lambda x: x['compound_score']),'most_negative': min(keyword_data, key=lambda x: x['compound_score']),'sentiment_distribution': {'positive': len([s for s in scores if s > 0.1]),'neutral': len([s for s in scores if -0.1 <= s <= 0.1]),'negative': len([s for s in scores if s < -0.1])}}return reportdef export_data(self, filename=None):"""导出数据到JSON文件"""if not filename:filename = f"sentiment_data_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"# 准备导出数据export_data = {'export_time': datetime.now().isoformat(),'total_records': len(self.sentiment_history),'sentiment_history': []}for record in self.sentiment_history:export_record = record.copy()export_record['timestamp'] = record['timestamp'].isoformat()export_data['sentiment_history'].append(export_record)# 写入文件with open(filename, 'w', encoding='utf-8') as f:json.dump(export_data, f, ensure_ascii=False, indent=2)print(f"数据已导出到: {filename}")return filename# 使用示例
monitor = SocialMediaSentimentMonitor()# 模拟社交媒体帖子
sample_posts = [("Just bought this amazing product! Love it so much! 😍", "twitter", "user1", ["product", "shopping"]),("Terrible customer service, never buying from them again 😡", "twitter", "user2", ["customer service", "shopping"]),("The weather is nice today ☀️", "twitter", "user3", ["weather"]),("This new restaurant is absolutely fantastic! 🍽️", "instagram", "user4", ["restaurant", "food"]),("Traffic is so bad today... 😤", "twitter", "user5", ["traffic"]),("I'm so excited about the new movie release! 🎬", "facebook", "user6", ["movie", "entertainment"]),("Working from home is great! Much more productive 💻", "linkedin", "user7", ["work", "productivity"]),("Can't believe how expensive everything has become 💸", "twitter", "user8", ["economy", "prices"]),("Had the best vacation ever! Thanks to the amazing staff! 🏖️", "instagram", "user9", ["vacation", "travel"]),("This app keeps crashing, so frustrating! 📱", "twitter", "user10", ["app", "technology"])
]print("开始处理社交媒体帖子...")
print("=" * 50)for post_text, platform, user_id, keywords in sample_posts:result = monitor.process_post(post_text, platform, user_id, keywords)print(f"平台: {platform}")print(f"用户: {user_id}")print(f"内容: {post_text}")print(f"情感分数: {result['compound_score']:.3f}")print(f"关键词: {keywords}")print("-" * 30)# 模拟时间间隔time.sleep(0.1)# 获取情感趋势
print("\n情感趋势分析:")
print("=" * 50)
trend = monitor.get_sentiment_trend(hours=1)
for hour, data in trend.items():print(f"{hour}: 平均情感 {data['average_sentiment']:.3f}, 帖子数量 {data['post_count']}")# 获取关键词报告
print("\n关键词情感报告:")
print("=" * 50)
shopping_report = monitor.get_keyword_sentiment_report("shopping", days=1)
print(json.dumps(shopping_report, ensure_ascii=False, indent=2))# 导出数据
export_file = monitor.export_data()
print(f"\n数据已导出到文件: {export_file}")
🚀 进阶:自定义情感分析模型
构建深度学习情感分析模型
让我们从头开始构建一个强大的深度学习情感分析模型:
import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Embedding, LSTM, Dense, Dropout, Bidirectional
from tensorflow.keras.preprocessing.text import Tokenizer
from tensorflow.keras.preprocessing.sequence import pad_sequences
from tensorflow.keras.utils import to_categorical
from sklearn.model_selection import train_test_split
from sklearn.metrics import classification_report, confusion_matrix
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as snsclass CustomSentimentAnalyzer:"""自定义情感分析器"""def __init__(self, vocab_size=10000, max_length=100, embedding_dim=128):self.vocab_size = vocab_sizeself.max_length = max_lengthself.embedding_dim = embedding_dimself.tokenizer = Noneself.model = Noneself.label_mapping = {"负面": 0, "中性": 1, "正面": 2}self.reverse_label_mapping = {0: "负面", 1: "中性", 2: "正面"}def preprocess_data(self, texts, labels):"""预处理文本数据"""# 创建分词器if self.tokenizer is None:self.tokenizer = Tokenizer(num_words=self.vocab_size, oov_token="<OOV>")self.tokenizer.fit_on_texts(texts)# 文本转序列sequences = self.tokenizer.texts_to_sequences(texts)# 填充序列padded_sequences = pad_sequences(sequences, maxlen=self.max_length, padding='post')# 标签编码encoded_labels = [self.label_mapping[label] for label in labels]categorical_labels = to_categorical(encoded_labels, num_classes=3)return padded_sequences, categorical_labelsdef build_model(self, model_type="lstm"):"""构建模型"""if model_type == "lstm":model = Sequential([Embedding(self.vocab_size, self.embedding_dim, input_length=self.max_length),LSTM(128, dropout=0.2, recurrent_dropout=0.2),Dense(64, activation='relu'),Dropout(0.5),Dense(3, activation='softmax')])elif model_type == "bilstm":model = Sequential([Embedding(self.vocab_size, self.embedding_dim, input_length=self.max_length),Bidirectional(LSTM(64, dropout=0.2, recurrent_dropout=0.2)),Dense(64, activation='relu'),Dropout(0.5),Dense(3, activation='softmax')])elif model_type == "cnn":from tensorflow.keras.layers import Conv1D, GlobalMaxPooling1Dmodel = Sequential([Embedding(self.vocab_size, self.embedding_dim, input_length=self.max_length),Conv1D(128, 5, activation='relu'),GlobalMaxPooling1D(),Dense(64, activation='relu'),Dropout(0.5),Dense(3, activation='softmax')])else:raise ValueError("支持的模型类型: 'lstm', 'bilstm', 'cnn'")model.compile(optimizer='adam',loss='categorical_crossentropy',metrics=['accuracy'])self.model = modelreturn modeldef train(self, texts, labels, validation_split=0.2, epochs=10, batch_size=32):"""训练模型"""# 预处理数据X, y = self.preprocess_data(texts, labels)# 划分训练集和验证集X_train, X_val, y_train, y_val = train_test_split(X, y, test_size=validation_split, random_state=42, stratify=y)# 训练模型history = self.model.fit(X_train, y_train,validation_data=(X_val, y_val),epochs=epochs,batch_size=batch_size,verbose=1)return historydef predict(self, texts):"""预测情感"""if self.model is None:raise ValueError("模型未训练,请先调用train()方法")# 预处理输入文本sequences = self.tokenizer.texts_to_sequences(texts)padded_sequences = pad_sequences(sequences, maxlen=self.max_length, padding='post')# 预测predictions = self.model.predict(padded_sequences)# 解码预测结果results = []for i, pred in enumerate(predictions):predicted_class = np.argmax(pred)confidence = np.max(pred)results.append({'text': texts[i],'predicted_sentiment': self.reverse_label_mapping[predicted_class],'confidence': float(confidence),'probabilities': {'负面': float(pred[0]),'中性': float(pred[1]),'正面': float(pred[2])}})return resultsdef evaluate(self, texts, labels):"""评估模型性能"""X, y = self.preprocess_data(texts, labels)# 预测predictions = self.model.predict(X)predicted_classes = np.argmax(predictions, axis=1)true_classes = np.argmax(y, axis=1)# 生成分类报告report = classification_report(true_classes, predicted_classes,target_names=['负面', '中性', '正面'],output_dict=True)# 生成混淆矩阵cm = confusion_matrix(true_classes, predicted_classes)return report, cmdef plot_training_history(self, history):"""绘制训练历史"""fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))# 准确率图ax1.plot(history.history['accuracy'], label='训练准确率')ax1.plot(history.history['val_accuracy'], label='验证准确率')ax1.set_title('模型准确率')ax1.set_ylabel('准确率')ax1.set_xlabel('轮次')ax1.legend()# 损失图ax2.plot(history.history['loss'], label='训练损失')ax2.plot(history.history['val_loss'], label='验证损失')ax2.set_title('模型损失')ax2.set_ylabel('损失')ax2.set_xlabel('轮次')ax2.legend()plt.tight_layout()plt.show()def plot_confusion_matrix(self, cm):"""绘制混淆矩阵"""plt.figure(figsize=(8, 6))sns.heatmap(cm, annot=True, fmt='d', cmap='Blues',xticklabels=['负面', '中性', '正面'],yticklabels=['负面', '中性', '正面'])plt.title('混淆矩阵')plt.ylabel('真实标签')plt.xlabel('预测标签')plt.show()# 使用示例
def create_sample_dataset():"""创建示例数据集"""texts = [# 正面评论"这个产品真的很棒,质量超好!","Amazing product, I love it!","服务态度很好,很满意","Perfect! Exactly what I needed.","物流很快,包装也很好","Great quality, highly recommend!","这次购物体验很愉快","Excellent customer service!",# 负面评论"这个产品质量很差,不推荐","Terrible product, waste of money","服务态度恶劣,很失望","Poor quality, returned immediately","包装破损,产品有问题","Worst purchase ever made","物流太慢了,很不满意","Disappointed with the service",# 中性评论"这个产品还可以,没什么特别的","The product is okay, nothing special","价格合理,质量一般","Average quality, decent price","还行吧,不好不坏","It's fine, meets basic needs","普通的产品,没有惊喜","Standard quality, as expected"]labels = [# 正面标签"正面", "正面", "正面", "正面", "正面", "正面", "正面", "正面",# 负面标签"负面", "负面", "负面", "负面", "负面", "负面", "负面", "负面",# 中性标签"中性", "中性", "中性", "中性", "中性", "中性", "中性", "中性"]return texts, labels# 创建和训练自定义模型
print("创建自定义情感分析模型...")
analyzer = CustomSentimentAnalyzer(vocab_size=5000, max_length=50)# 准备数据
texts, labels = create_sample_dataset()# 构建模型
print("构建LSTM模型...")
model = analyzer.build_model(model_type="lstm")
print(model.summary())# 训练模型
print("开始训练模型...")
history = analyzer.train(texts, labels, epochs=20, batch_size=16)# 绘制训练历史
analyzer.plot_training_history(history)# 测试模型
test_texts = ["这个产品超级棒,强烈推荐!","质量很差,完全不值这个价格","还可以吧,没什么特别的感觉","Absolutely amazing! Best purchase ever!","Not impressed, expected better quality"
]print("\n测试预测结果:")
print("=" * 50)
predictions = analyzer.predict(test_texts)
for pred in predictions:print(f"文本: {pred['text']}")print(f"预测情感: {pred['predicted_sentiment']}")print(f"置信度: {pred['confidence']:.3f}")print(f"概率分布: {pred['probabilities']}")print("-" * 30)# 评估模型
print("\n模型评估:")
report, cm = analyzer.evaluate(texts, labels)
print("分类报告:")
for class_name, metrics in report.items():if isinstance(metrics, dict):print(f"{class_name}: 精确率={metrics['precision']:.3f}, 召回率={metrics['recall']:.3f}, F1={metrics['f1-score']:.3f}")# 绘制混淆矩阵
analyzer.plot_confusion_matrix(cm)
集成多个模型的投票分类器
创建一个集成多个情感分析模型的投票系统:
from sklearn.ensemble import VotingClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVC
from sklearn.naive_bayes import MultinomialNB
from sklearn.feature_extraction.text import TfidfVectorizer
import numpy as npclass EnsembleSentimentAnalyzer:"""集成情感分析器"""def __init__(self):self.vectorizer = TfidfVectorizer(max_features=5000, ngram_range=(1, 2))self.ensemble_model = Noneself.individual_models = {}def create_ensemble(self):"""创建集成模型"""# 创建基础分类器logistic = LogisticRegression(random_state=42)svm = SVC(probability=True, random_state=42)naive_bayes = MultinomialNB()# 创建投票分类器self.ensemble_model = VotingClassifier(estimators=[('logistic', logistic),('svm', svm),('naive_bayes', naive_bayes)],voting='soft' # 使用概率投票)# 保存单个模型引用self.individual_models = {'logistic': logistic,'svm': svm,'naive_bayes': naive_bayes}return self.ensemble_modeldef train(self, texts, labels):"""训练集成模型"""# 文本向量化X = self.vectorizer.fit_transform(texts)# 训练集成模型self.ensemble_model.fit(X, labels)return self.ensemble_modeldef predict_detailed(self, texts):"""详细预测结果"""# 向量化输入文本X = self.vectorizer.transform(texts)# 集成模型预测ensemble_predictions = self.ensemble_model.predict(X)ensemble_probabilities = self.ensemble_model.predict_proba(X)# 各个模型的预测individual_predictions = {}for name, model in self.individual_models.items():individual_predictions[name] = model.predict(X)# 整理结果results = []for i, text in enumerate(texts):result = {'text': text,'ensemble_prediction': ensemble_predictions[i],'ensemble_confidence': np.max(ensemble_probabilities[i]),'individual_predictions': {name: pred[i] for name, pred in individual_predictions.items()},'class_probabilities': {class_name: prob for class_name, prob in zip(self.ensemble_model.classes_, ensemble_probabilities[i])}}results.append(result)return resultsdef analyze_model_agreement(self, texts):"""分析模型一致性"""X = self.vectorizer.transform(texts)# 获取各模型预测predictions = {}for name, model in self.individual_models.items():predictions[name] = model.predict(X)# 计算一致性agreement_stats = []for i in range(len(texts)):text_predictions = [pred[i] for pred in predictions.values()]# 计算一致性分数unique_predictions = set(text_predictions)agreement_score = (len(text_predictions) - len(unique_predictions) + 1) / len(text_predictions)agreement_stats.append({'text': texts[i],'predictions': {name: pred[i] for name, pred in predictions.items()},'agreement_score': agreement_score,'unanimous': len(unique_predictions) == 1})return agreement_stats# 使用示例
ensemble_analyzer = EnsembleSentimentAnalyzer()# 创建集成模型
ensemble_model = ensemble_analyzer.create_ensemble()# 准备训练数据
train_texts = ["这个产品质量很好,我很满意","服务态度差,不推荐","价格合理,性价比不错","完全是垃圾,浪费钱","还可以,没什么特别的"
]train_labels = ["正面", "负面", "正面", "负面", "中性"]# 训练模型
print("训练集成模型...")
ensemble_analyzer.train(train_texts, train_labels)# 测试预测
test_texts = ["这个产品超级棒,强烈推荐!","质量很差,不值这个价格","还行吧,没什么特别的","Absolutely fantastic product!","Not impressed at all"
]print("\n集成模型预测结果:")
print("=" * 60)
detailed_results = ensemble_analyzer.predict_detailed(test_texts)for result in detailed_results:print(f"文本: {result['text']}")print(f"集成预测: {result['ensemble_prediction']} (置信度: {result['ensemble_confidence']:.3f})")print(f"各模型预测:")for model_name, prediction in result['individual_predictions'].items():print(f" {model_name}: {prediction}")print(f"类别概率: {result['class_probabilities']}")print("-" * 50)# 分析模型一致性
print("\n模型一致性分析:")
print("=" * 60)
agreement_analysis = ensemble_analyzer.analyze_model_agreement(test_texts)for analysis in agreement_analysis:print(f"文本: {analysis['text']}")print(f"各模型预测: {analysis['predictions']}")print(f"一致性分数: {analysis['agreement_score']:.3f}")print(f"是否一致: {'是' if analysis['unanimous'] else '否'}")print("-" * 50)
🏭 行业应用实例
1. 金融市场情感分析
import yfinance as yf
from datetime import datetime, timedelta
import pandas as pdclass FinancialSentimentAnalyzer:"""金融市场情感分析器"""def __init__(self):self.financial_keywords = {'bullish': ['上涨', '看涨', '利好', 'bullish', 'positive', 'growth'],'bearish': ['下跌', '看跌', '利空', 'bearish', 'negative', 'decline'],'volatile': ['波动', '不稳定', 'volatile', 'unstable', 'fluctuation']}def analyze_market_sentiment(self, news_texts, stock_symbol=None):"""分析市场情感"""from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzeranalyzer = SentimentIntensityAnalyzer()results = []for text in news_texts:# 基础情感分析sentiment = analyzer.polarity_scores(text)# 金融关键词检测market_direction = self.detect_market_direction(text)# 如果有股票代码,获取价格数据price_data = Noneif stock_symbol:try:stock = yf.Ticker(stock_symbol)price_data = stock.history(period="1d")except:price_data = Noneresult = {'text': text,'sentiment_score': sentiment['compound'],'market_direction': market_direction,'stock_symbol': stock_symbol,'price_data': price_data,'timestamp': datetime.now()}results.append(result)return resultsdef detect_market_direction(self, text):"""检测市场方向"""text_lower = text.lower()bullish_count = sum(1 for word in self.financial_keywords['bullish'] if word in text_lower)bearish_count = sum(1 for word in self.financial_keywords['bearish'] if word in text_lower)if bullish_count > bearish_count:return 'bullish'elif bearish_count > bullish_count:return 'bearish'else:return 'neutral'def generate_trading_signal(self, sentiment_results):"""生成交易信号"""if not sentiment_results:return 'hold'avg_sentiment = sum(r['sentiment_score'] for r in sentiment_results) / len(sentiment_results)bullish_signals = sum(1 for r in sentiment_results if r['market_direction'] == 'bullish')bearish_signals = sum(1 for r in sentiment_results if r['market_direction'] == 'bearish')# 综合判断if avg_sentiment > 0.3 and bullish_signals > bearish_signals:return 'buy'elif avg_sentiment < -0.3 and bearish_signals > bullish_signals:return 'sell'else:return 'hold'# 使用示例
financial_analyzer = FinancialSentimentAnalyzer()# 金融新闻示例
financial_news = ["苹果公司发布了超预期的季度财报,股价有望上涨。","Apple reports strong quarterly earnings, stock expected to rise.","市场对新政策反应积极,投资者信心增强。","Technology sector faces headwinds amid regulatory concerns.","经济数据显示增长放缓,市场情绪谨慎。","Federal Reserve hints at potential interest rate cuts."
]print("金融市场情感分析:")
print("=" * 50)results = financial_analyzer.analyze_market_sentiment(financial_news, "AAPL")for result in results:print(f"新闻: {result['text']}")print(f"情感分数: {result['sentiment_score']:.3f}")print(f"市场方向: {result['market_direction']}")print("-" * 40)# 生成交易信号
trading_signal = financial_analyzer.generate_trading_signal(results)
print(f"\n交易信号: {trading_signal}")
2. 社交媒体品牌监控
class BrandMonitoringSystem:"""品牌监控系统"""def __init__(self, brand_name):self.brand_name = brand_nameself.monitoring_data = []self.alert_thresholds = {'negative_spike': -0.5,'volume_spike': 100,'engagement_drop': 0.3}def process_social_media_post(self, post_text, platform, engagement_metrics=None):"""处理社交媒体帖子"""from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzeranalyzer = SentimentIntensityAnalyzer()# 检查是否提及品牌if self.brand_name.lower() not in post_text.lower():return None# 情感分析sentiment = analyzer.polarity_scores(post_text)# 提取关键信息post_data = {'timestamp': datetime.now(),'platform': platform,'text': post_text,'sentiment_score': sentiment['compound'],'engagement_metrics': engagement_metrics or {},'brand_mentions': post_text.lower().count(self.brand_name.lower())}self.monitoring_data.append(post_data)# 检查是否需要警报self.check_brand_alerts(post_data)return post_datadef check_brand_alerts(self, post_data):"""检查品牌警报"""# 负面情感警报if post_data['sentiment_score'] <= self.alert_thresholds['negative_spike']:print(f"🚨 品牌负面情感警报!")print(f"平台: {post_data['platform']}")print(f"内容: {post_data['text'][:100]}...")print(f"情感分数: {post_data['sentiment_score']:.3f}")print("-" * 50)def generate_brand_report(self, days=7):"""生成品牌报告"""cutoff_date = datetime.now() - timedelta(days=days)recent_data = [post for post in self.monitoring_dataif post['timestamp'] >= cutoff_date]if not recent_data:return "没有足够的数据生成报告"# 计算指标total_mentions = len(recent_data)avg_sentiment = sum(post['sentiment_score'] for post in recent_data) / total_mentions# 平台分布platform_distribution = {}for post in recent_data:platform = post['platform']platform_distribution[platform] = platform_distribution.get(platform, 0) + 1# 情感分布positive_count = len([p for p in recent_data if p['sentiment_score'] > 0.1])negative_count = len([p for p in recent_data if p['sentiment_score'] < -0.1])neutral_count = total_mentions - positive_count - negative_count# 最负面的帖子most_negative = min(recent_data, key=lambda x: x['sentiment_score'])report = {'brand_name': self.brand_name,'report_period': f"过去{days}天",'total_mentions': total_mentions,'average_sentiment': avg_sentiment,'platform_distribution': platform_distribution,'sentiment_distribution': {'positive': positive_count,'neutral': neutral_count,'negative': negative_count},'most_negative_post': most_negative['text'],'most_negative_score': most_negative['sentiment_score']}return report# 使用示例
brand_monitor = BrandMonitoringSystem("iPhone")# 模拟社交媒体帖子
social_posts = [("Just got the new iPhone! Amazing camera quality 📸", "instagram", {"likes": 150, "comments": 20}),("iPhone battery dies so quickly, very disappointed 😞", "twitter", {"retweets": 50, "likes": 30}),("iPhone photography tips for beginners 📱", "youtube", {"views": 1000, "likes": 80}),("Why is iPhone so expensive? Not worth the money!", "facebook", {"shares": 25, "comments": 40}),("iPhone vs Android comparison - iPhone wins!", "twitter", {"retweets": 200, "likes": 500}),("iPhone repair costs are ridiculous 😡", "reddit", {"upvotes": 300, "comments": 100})
]print("品牌监控系统测试:")
print("=" * 50)for post_text, platform, engagement in social_posts:result = brand_monitor.process_social_media_post(post_text, platform, engagement)if result:print(f"平台: {platform}")print(f"内容: {post_text}")print(f"情感分数: {result['sentiment_score']:.3f}")print(f"参与度: {engagement}")print("-" * 40)# 生成品牌报告
print("\n品牌监控报告:")
print("=" * 50)
report = brand_monitor.generate_brand_report(days=1)
print(f"品牌: {report['brand_name']}")
print(f"报告期间: {report['report_period']}")
print(f"总提及数: {report['total_mentions']}")
print(f"平均情感: {report['average_sentiment']:.3f}")
print(f"平台分布: {report['platform_distribution']}")
print(f"情感分布: {report['sentiment_distribution']}")
print(f"最负面帖子: {report['most_negative_post']}")
print(f"最负面分数: {report['most_negative_score']:.3f}")
🔮 未来发展趋势
1. 多模态情感分析
class MultimodalSentimentAnalyzer:"""多模态情感分析器(文本+图像+音频)"""def __init__(self):self.text_analyzer = Noneself.image_analyzer = Noneself.audio_analyzer = Nonedef analyze_multimodal_content(self, text=None, image_path=None, audio_path=None):"""分析多模态内容"""results = {}# 文本情感分析if text:from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzeranalyzer = SentimentIntensityAnalyzer()text_sentiment = analyzer.polarity_scores(text)results['text_sentiment'] = text_sentiment['compound']# 图像情感分析(模拟)if image_path:# 这里应该使用实际的图像情感分析模型# 比如使用CNN分析图像中的情感image_sentiment = self.analyze_image_sentiment(image_path)results['image_sentiment'] = image_sentiment# 音频情感分析(模拟)if audio_path:# 这里应该使用语音情感识别模型audio_sentiment = self.analyze_audio_sentiment(audio_path)results['audio_sentiment'] = audio_sentiment# 融合多模态结果if len(results) > 1:# 简单的平均融合avg_sentiment = sum(results.values()) / len(results)results['fused_sentiment'] = avg_sentimentreturn resultsdef analyze_image_sentiment(self, image_path):"""分析图像情感(模拟实现)"""# 这里应该使用实际的图像情感分析模型# 比如使用预训练的CNN模型import randomreturn random.uniform(-1, 1) # 模拟结果def analyze_audio_sentiment(self, audio_path):"""分析音频情感(模拟实现)"""# 这里应该使用实际的语音情感识别模型# 比如使用语音特征提取+分类器import randomreturn random.uniform(-1, 1) # 模拟结果# 使用示例
print("多模态情感分析概念演示:")
print("=" * 50)multimodal_analyzer = MultimodalSentimentAnalyzer()# 模拟多模态内容分析
example_contents = [{'text': "I'm so happy today! 😊",'image_path': 'happy_face.jpg','audio_path': 'happy_voice.wav'},{'text': "This is terrible...",'image_path': 'sad_face.jpg','audio_path': 'sad_voice.wav'}
]for i, content in enumerate(example_contents):print(f"内容 {i+1}:")result = multimodal_analyzer.analyze_multimodal_content(text=content['text'],image_path=content['image_path'],audio_path=content['audio_path'])print(f" 文本: {content['text']}")print(f" 文本情感: {result.get('text_sentiment', 'N/A'):.3f}")print(f" 图像情感: {result.get('image_sentiment', 'N/A'):.3f}")print(f" 音频情感: {result.get('audio_sentiment', 'N/A'):.3f}")print(f" 融合情感: {result.get('fused_sentiment', 'N/A'):.3f}")print("-" * 40)
2. 实时情感分析系统
import asyncio
from collections import dequeclass RealTimeSentimentAnalyzer:"""实时情感分析系统"""def __init__(self, window_size=100):self.sentiment_stream = deque(maxlen=window_size)self.alert_callbacks = []self.running = Falsedef add_alert_callback(self, callback):"""添加警报回调函数"""self.alert_callbacks.append(callback)async def process_stream(self, text_stream):"""处理文本流"""from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzeranalyzer = SentimentIntensityAnalyzer()self.running = Trueasync for text in text_stream:if not self.running:break# 分析情感sentiment = analyzer.polarity_scores(text)# 创建数据点data_point = {'timestamp': datetime.now(),'text': text,'sentiment': sentiment['compound']}# 添加到流中self.sentiment_stream.append(data_point)# 检查异常await self.check_anomalies(data_point)# 更新实时统计await self.update_real_time_stats()async def check_anomalies(self, data_point):"""检查情感异常"""if len(self.sentiment_stream) < 10:return# 计算近期平均情感recent_sentiments = [dp['sentiment'] for dp in list(self.sentiment_stream)[-10:]]avg_sentiment = sum(recent_sentiments) / len(recent_sentiments)# 检查是否有异常if abs(data_point['sentiment'] - avg_sentiment) > 0.8:# 触发警报alert_data = {'type': 'sentiment_anomaly','current_sentiment': data_point['sentiment'],'average_sentiment': avg_sentiment,'text': data_point['text'],'timestamp': data_point['timestamp']}# 调用所有警报回调for callback in self.alert_callbacks:await callback(alert_data)async def update_real_time_stats(self):"""更新实时统计"""if not self.sentiment_stream:return# 计算实时统计sentiments = [dp['sentiment'] for dp in self.sentiment_stream]stats = {'count': len(sentiments),'average': sum(sentiments) / len(sentiments),'min': min(sentiments),'max': max(sentiments),'positive_ratio': len([s for s in sentiments if s > 0.1]) / len(sentiments),'negative_ratio': len([s for s in sentiments if s < -0.1]) / len(sentiments)}# 这里可以更新仪表板或发送到监控系统print(f"实时统计: 平均情感 {stats['average']:.3f}, 正面比例 {stats['positive_ratio']:.1%}")def stop(self):"""停止处理"""self.running = False# 警报回调函数
async def sentiment_alert_handler(alert_data):"""处理情感警报"""print(f"🚨 情感异常警报!")print(f"当前情感: {alert_data['current_sentiment']:.3f}")print(f"平均情感: {alert_data['average_sentiment']:.3f}")print(f"文本: {alert_data['text']}")print(f"时间: {alert_data['timestamp']}")print("-" * 50)# 使用示例(模拟)
print("实时情感分析系统概念演示:")
print("=" * 50)async def simulate_text_stream():"""模拟文本流"""texts = ["Great product, love it!","Not bad, could be better.","Excellent service, highly recommend!","This is terrible, worst experience ever!", # 异常"Pretty good overall.","Amazing quality, will buy again!","Disappointed with the delivery.","Perfect! Exactly what I needed.","Absolute garbage, waste of money!", # 异常"Satisfied with the purchase."]for text in texts:yield textawait asyncio.sleep(1) # 模拟实时间隔# 这里是异步代码的概念演示
print("实时情感分析系统已准备就绪...")
print("(在实际应用中,这将连接到实时数据流)")
🎬 下集预告
恭喜你!🎉 现在你已经掌握了情感分析的核心技术,从基础的词典方法到先进的深度学习模型,从单语言到多语言,从简单的文本分析到复杂的多模态融合。
你学会了:
- 🔍 多种情感分析方法:词典、机器学习、深度学习
- 🛠️ 丰富的工具库:TextBlob、VADER、SnowNLP、Transformers
- 💼 实际应用场景:电商评论、客服系统、社交媒体监控
- 🌍 多语言支持:跨语言情感分析技术
- 🎯 挑战解决方案:讽刺检测、上下文理解、领域适应
下一篇文章《文本分类:让AI给文章贴标签》将带你进入更广阔的文本分类世界!我们将探索:
- 📂 分类算法大全:从朴素贝叶斯到BERT
- 🎯 多分类与多标签:复杂分类任务的解决方案
- 🔄 主动学习:用更少的数据训练更好的模型
- 🏷️ 实体识别:从文本中提取结构化信息
想象一下,如果情感分析是教AI理解情感,那么文本分类就是教AI理解内容的类别和主题。你将学会如何构建智能的文档分类系统,让AI成为一个高效的"文档管理员"!
📝 总结与思考题
🌟 本文关键知识点
- 情感分析基础:极性、强度、目标的识别
- 技术方法:词典法、机器学习、深度学习
- 工具库使用:TextBlob、VADER、SnowNLP等
- 实战项目:电商评论、社交媒体监控、客服系统
- 高级技巧:自定义模型、集成学习、多模态分析
- 挑战与解决:讽刺检测、上下文理解、领域适应
- 行业应用:金融、品牌监控、实时分析
🤔 思考题
- 如何设计一个能够检测网络暴力的情感分析系统?
- 在处理长文本时,如何平衡局部情感和全局情感?
- 如何构建一个支持方言的字词转化系统?