当前位置：首页 > news >正文

使用LLM（Ollama部署）为Bertopic确定的主题命名

news 2025/9/10 7:03:18

使用本地部署的 Ollama + Qwen3:14b 模型，结合 BERTopic 输出的关键词与样本摘要，自动生成 3–4 个词的主题名称。整个流程自动化、可复用，适用于学术论文、新闻聚类、客户反馈分析等多种场景。

实现思路

我们设计了一个简单的 Python 函数 generate_topic_name()，它接收两个参数：

topic_keywords：由 BERTopic 生成的当前主题关键词列表；
sample_abstracts：属于该主题的若干样本摘要（用于提供上下文）。

函数构造一个清晰的提示词（prompt），调用本地 Ollama 模型进行推理，并对输出结果进行后处理，移除模型可能生成的 <think>...</think> 思考标签（常见于 Qwen 系列模型），最终返回干净的主题名称。

代码详解

import ollama
import redef remove_thinking_tags(text):"""移除所有 <think>...<think> 标签及其内部内容（支持跨行）"""pattern = r'<think>.*?</think>'cleaned = re.sub(pattern, '', text, flags=re.DOTALL)return cleaned.strip()def generate_topic_name(topic_keywords, sample_abstracts):prompt = f""" /no_thinkingYou are a helpful assistant for naming topics from research paper abstracts.
Given the following keywords generated using BERTopic and sample abstracts, generate a short and meaningful topic name.The topic name should be very short, maximum of 3 to 4 words — not a sentence or description.Keywords: {', '.join(topic_keywords)}Abstracts:
{chr(10).join(f"- {abs}" for abs in sample_abstracts)}Give a concise 3–4 word topic name:"""response = ollama.chat(model='Qwen3:14b',messages=[{'role': 'user', 'content': prompt}],options={'temperature': 0.7,'num_predict': 3000  # 类似 max_tokens})raw_content = response['message']['content'].strip()# 清理 thinking 标签内容cleaned_content = remove_thinking_tags(raw_content)return cleaned_contentrenamed_topics = {}for entry in llm_input:name = generate_topic_name(entry["topic_keywords"], entry["sample_abstracts"][:5])renamed_topics[entry["topic_num"]] = nameprint(f"Topic {entry['topic_num']}: {name}")