当前位置：首页 > news >正文

微软TinyTroupe“人格”模拟库：AI智能体市场调研-V3版本（五）

news 2025/11/8 7:38:29

文章目录

1 引言
2. 实战案例深度剖析：瓶装西班牙冷汤的市场潜力评估 (V4)
- 2.1. 案例概述
- 2.2. 逐层代码解析
- - 2.2.1. 环境初始化与缓存机制
  - 2.2.2. 定义研究变量与Agent生成指令
  - 2.2.3. 配置结果提取器与分析函数
  - 2.2.4. 生成Agent群体并执行模拟
  - 2.2.5. 提取并分析模拟结果
  - 2.2.6. 引入实证数据进行交叉验证
  - 2.2.7. 执行统计检验并解读结果
- 2.3. 完整代码示例
3. 场景应用与技术展望
4 总结

1 引言

在AI Agent群体仿真技术日益成熟的今天，我们面临一个核心的拷问：我们构建的虚拟社会，在多大程度上能真实反映现实世界的复杂动态？单纯的模拟虽然能提供方向性的洞察，但其结果的置信度（Confidence）始终是决策者心中的一块疑云。如何量化评估模拟的真实性，并基于此进行迭代优化，是推动该技术从“有趣的实验”走向“可靠的决策工具”的关键一步。

本文将继续深度剖析 TinyTroupe 这一前沿的AI Agent群体仿真框架，并引入其在V4版本中的革命性功能——实证数据验证（Empirical Data Validation）。TinyTroupe 通过将模拟结果与真实的、通过传统调研获得的基准数据进行直接的统计学对比，为我们提供了一把量化“虚拟”与“现实”之间距离的标尺。

TinyTroupe的核心技术特性现已演进为：

基于人口统计学的Agent生成：依据真实世界人口数据，程序化生成具有统计学代表性的异构AI Agent群体。
深度个性化与情境感知：每个Agent拥有独立的背景、记忆与复杂的决策逻辑。
可编程的交互环境：灵活定义Agent间的交互拓扑与信息传播机制。
结构化数据自动提取：从非结构化文本中精准抽取可量化的结构化数据。
样本质量校验（Profiler）：通过Profiler工具，确保生成的虚拟样本在统计分布上忠于预设的人口普查数据。
实证数据验证（Empirical Validation）：引入SimulationExperimentEmpiricalValidator，通过T-test、KS-test等统计学方法，量化对比模拟数据与真实世界数据的分布相似性，为模拟结果提供置信度评分。

其典型的应用场景也因此得到深化：

高置信度的新产品概念测试：不仅评估市场接受度，更能提供该评估结果的“可信度指数”，帮助决策者判断风险。
营销策略的“数字孪生”测试：在经过验证、与真实市场高度相似的虚拟环境中测试营销策略，极大提升预测的准确性。
公共政策模拟与影响评估：在出台政策前，构建一个与目标社会群体统计特征一致的“数字孪生社会”，预测政策影响，并给出预测的置信度。

接下来，本文将通过一个迄今为止最完整的商业案例——“瓶装即饮西班牙冷汤（Gazpacho）在美国市场的商业潜力评估（V4版）”，深入剖析TinyTroupe如何完成从模拟到验证的完整闭环，实现对市场洞察的一次置信度革命。
在这里插入图片描述

本文基于官方案例改编：
Bottled Gazpacho Market Research 4.ipynb

是继承：
微软TinyTroupe轻量级多智能体“人格”模拟库：AI智能体市场调研（三）
微软TinyTroupe“人格”模拟库：AI智能体市场调研升级版（四）

2. 实战案例深度剖析：瓶装西班牙冷汤的市场潜力评估 (V4)

2.1. 案例概述

本案例是“瓶装西班牙冷汤”系列研究的第四阶段，也是方法论上的一次重大飞跃。在前序版本中，我们探索了如何生成多样化乃至极端的Agent群体来对市场进行压力测试。在V4版本中，我们的核心目标是回答一个终极问题：我们的模拟结果，究竟有多准？

研究目标：
1. 执行一次包含极端用户画像的市场潜力模拟。
2. 引入一份真实的、通过传统问卷收集的关于同一问题的基准数据集（Control Data）。
3. 使用TinyTroupe的验证工具，从统计学上量化**模拟数据（Treatment Data）**与基准数据的相似度。
4. 最终，不仅得出市场潜力的结论，更要为这个结论提供一个可量化的置信度。
技术路径：定义Agent生成规则 -> 执行模拟 -> 提取模拟结果 -> 加载真实世界调研数据 -> 使用SimulationExperimentEmpiricalValidator进行数据对齐 -> 调用validate_simulation_experiment_empirically执行统计检验 -> 解读验证报告。
所需依赖：tinytroupe, pandas, matplotlib。
所需数据：
- Agent生成数据: ./information/populations/usa.json
- 实证验证基准数据: ../data/empirical/07.19.2025 - Market Research - Bottled Gazpacho - Raw Data.csv

2.2. 逐层代码解析

2.2.1. 环境初始化与缓存机制

代码意图解释：
此部分代码导入所有必需的Python库。
sys.path.insert(0, '..')确保了可以直接从父目录调用tinytroupe项目的源代码。control.begin(...)函数启动了TinyTroupe的缓存系统，它会创建一个名为bottled_gazpacho_market_research_4.cache.json的缓存文件。

当代码重新运行时，TinyTroupe会检查每个计算步骤，如果该步骤的输入没有变化，它将直接从缓存文件中加载上一次的结果，从而跳过昂贵的LLM调用或计算过程，极大地提升了开发和调试效率。

粘贴完整代码：

import json
import sys
import pandas as pd
import matplotlib.pyplot as pltsys.path.insert(0, '..')import tinytroupe
from tinytroupe import config_manager
from tinytroupe.agent import TinyPerson
from tinytroupe.environment import TinyWorld
from tinytroupe.factory import TinyPersonFactory
from tinytroupe.validation import TinyPersonValidator
from tinytroupe import controlfrom tinytroupe.extraction import ResultsExtractor
from tinytroupe.profiling import Profilerfrom tinytroupe.validation import SimulationExperimentEmpiricalValidator, SimulationExperimentDataset, validate_simulation_experiment_empirically#config_manager.update("action_generator_enable_quality_checks", True)
#config_manager.update("action_generator_quality_threshold", 6)First of all, we'll use a cached simulation, to avoid having to recompute expensive steps unless really necessary. We accomplish this via the `control.begin()` function. The file it takes as an argument is the cache file that will be created and then updated as needed.control.begin("bottled_gazpacho_market_research_4.cache.json")

2.2.2. 定义研究变量与Agent生成指令

代码意图解释：
这部分定义了模拟的核心参数。
target_nationality和population_size设定了目标市场和样本量。additional_demographic_specification是一个精心设计的Prompt，用于指导LLM生成更多元、更极端的Agent画像，确保样本覆盖从主流到边缘的各类消费者。
interviewer_main_question和inner_monologue则构成了调研脚本，值得注意的是，在1分和5分的选项描述以及内心独白中，都明确鼓励Agent给出极端和情绪化的答案（IT IS OK to give this extreme and impulsive answer...），以此对抗“趋中效应”，获取更真实的意见分布。

粘贴完整代码：

target_nationality = "American"
population_size = 50additional_demographic_specification = f"""
BESIDES other dimensions infered from the population demographic data (i.e., make sure those are still present), 
ensure these ADDITIONAL sampling dimensions are present, which must be as realistic as possible:- general attitude: from positive to negative, from optimistic to pessimistic, from open-minded to closed-minded- education: from the completely illiterate (can't even talk properly) to the highest scientist- culinary tastes: from traditional to modern, from spicy to mild, from vegetarian to meat-based- shopping habits: from frequent to occasional, from online to in-store, from frugal to extravagant- health consciousness: from health-focused to indulgent, from organic to conventional- attitude towards new products: from open-minded to skeptical, from adventurous to conservative- cultural influences: from local to global, from traditional to contemporary, from diverse to homogeneous- travel habits: from frequent travelers to homebodies, from local to international, from adventurous to cautious- cultural background: from diverse to homogeneous, from traditional to modern, from local to global- lifestyle: from active to sedentary, from urban to rural, from minimalist to extravagantEach of these additional dimensions MUST have AT LEAST 10 values. 
For all of these, YOU MUST provide long, detailed sentences that describe the values in each dimension. NOT short
words. We need EACH value to be VERY detailed.Make sure you also include EXTREME values so that we can properly capture even edge cases. We want a WIDE
range of different people and tastes!
"""interviewer_introduction =\"""We are performing some market research and need to know you more. Can you please present yourself and also list your top-10 interests?"""interviewer_main_question =\""" Gazpacho is a cold, blended vegetable soup originally from Spain, made mainly with tomatoes, cucumbers, peppers, and olive oil. We are considering offering it in supermarkets near you.Would you consider purchasing ready-to-drink bottled gazpacho if it was available at your local supermarket? How much do you like this idea? Please rate your propensity of purchasing it (from 1 to 5), where:- 1: would NEVER buy it. Note that IT IS OK to give this extreme and impulsive answer if it is how you feel, as it is part of the human experience.- 2: very unlikely, but not impossible.- 3: maybe I would buy it, not sure.- 4: it is very likely.- 5: would CERTAINLY buy it. Note that IT IS OK to give this extreme and impulsive answer if it is how you feel, as it is part of the human experience."""inner_monologue =\"""I will be honest as I understand they are not here to judge me, but just to learn from me. Such choices depend on many factors, but I will make my best guess, considering my current situation in life, location, job and interests. I will not refrain from giving extreme answers, such as 1 or 5, if that's how I really feel, as this exercise requires me to be honest, human and realistic.Now I **must** first THINK deeply about the question, consider all the factors that might influence my decision,and only then I will TALK with my response to the question as best, as detailed and as honestly as I can."""

2.2.3. 配置结果提取器与分析函数

代码意图解释：
ResultsExtractor被配置用于从Agent的自由文本回复中，精准地抽取出1到5的评分和支撑该评分的理由。fields_hints参数对于确保LLM输出格式的稳定性至关重要。
is_there_a_good_market函数则是一个业务逻辑判断器，它接收包含调研结果的DataFrame，计算正面、中性和负面响应的百分比，并根据预设的阈值（积极响应超过10%且消极响应低于50%）给出一个明确的商业建议：“是”或“否”存在一个好的市场。

粘贴完整代码：

results_extractor = ResultsExtractor(extraction_objective="Find whether the person would buy the product. A person rate his/her propensity from 1 (would NEVER buy) to 5 (would CERTAINLY buy it)." ,situation="Agent was asked to rate their interest in a bottled Gazpacho. They can respond with a propensity score from 1 (would NEVER buy) to 5 (would CERTAINLY buy it).", fields=["name", "response", "justification"],fields_hints={"response": "Must be a string formatted exactly as '1', '2', '3', '4', '5' or 'N/A'(if there is no response or you cannot determine the precise response)."},verbose=True)def is_there_a_good_market(df, positive_threshold=0.1, negative_threshold=0.5):# Convert responses to strings for consistent handlingdf_copy = df.copy()df_copy["response"] = df_copy["response"].astype(str)# Get counts and calculate percentagescounts = df_copy["response"].value_counts()total = counts.sum()percentage = counts / total# Calculate percentages by rating category (using 1-5 scale)percentage_positive = percentage.get("4", 0) + percentage.get("5", 0)percentage_neutral = percentage.get("3", 0)percentage_negative = percentage.get("1", 0) + percentage.get("2", 0)percentage_na = percentage.get("N/A", 0)# Print the analysisprint(f"Percentage of positive responses (4-5): {percentage_positive:.2%}")print(f"Percentage of neutral responses (3): {percentage_neutral:.2%}")print(f"Percentage of negative responses (1-2): {percentage_negative:.2%}")print(f"Percentage of 'N/A' responses: {percentage_na:.2%}")# also compute the mean and standard deviation of the responsesdf_copy["response"] = pd.to_numeric(df_copy["response"], errors='coerce')mean_response = df_copy["response"].mean()std_response = df_copy["response"].std()print(f"Mean response: {mean_response:.2f}")print(f"Standard deviation of responses: {std_response:.2f}")# Decision based on thresholdsif percentage_positive > positive_threshold and percentage_negative < negative_threshold:print("VERDICT: There is a good market for bottled gazpacho.")return Trueelse:print("VERDICT: There is not a good market for bottled gazpacho.")return False

2.2.4. 生成Agent群体并执行模拟

代码意图解释：
首先，通过TinyPersonFactory并结合美国人口数据（usa.json）和我们定制的additional_demographic_specification，创建了一个Agent工厂。
随后，调用factory.generate_people()生成了50个具有深度背景故事和个性化特征的AI Agent。Profiler工具在这一步被用来校验生成的Agent样本是否在统计上符合预期的人口分布，这是一个关键的质量保证步骤。
最后，将生成的people放入一个TinyWorld环境中，并通过broadcast系列方法向所有Agent下达指令，market.run(1)触发模拟的执行。

粘贴完整代码：

factory = TinyPersonFactory.create_factory_from_demography("./information/populations/usa.json", population_size=population_size, additional_demographic_specification=additional_demographic_specification)
people = factory.generate_people(population_size, verbose=True)profiler = Profiler()
profiler.profile(people)control.checkpoint()market = TinyWorld(f"Target audience ({target_nationality})", people, broadcast_if_no_target=False)
market.broadcast(interviewer_introduction)
market.broadcast(interviewer_main_question)
market.broadcast_thought(inner_monologue)
market.run(1)control.checkpoint()

2.2.5. 提取并分析模拟结果

代码意图解释：
模拟运行结束后，results_extractor从每个Agent的记忆中提取出我们关心的答案。结果被加载到Pandas DataFrame中，方便进行后续的量化分析。代码通过value_counts()和条形图直观地展示了不同评分的分布情况，并计算了平均分和标准差。最后，调用is_there_a_good_market函数，基于预设的商业逻辑给出了初步的市场判断。

粘贴完整代码：

results = results_extractor.extract_results_from_agents(people)
# make sure the results are all dicts
filtered_results = [item for item in results if isinstance(item, dict)]
# load a list of dicts into a pandas dataframe
df = pd.DataFrame(filtered_results)
df["response"].value_counts().reindex(["1", "2", "3", "4", "5", "N/A"]).plot(kind='bar')
plt.xlabel('Response')
plt.ylabel('Count')
plt.title('Distribution of Responses')
average_score = df["response"].replace("N/A", pd.NA).astype(float).mean()
std_score = df["response"].replace("N/A", pd.NA).astype(float).std()
print(f"Average score: {average_score:.2f}")
print(f"Standard deviation of scores: {std_score:.2f}")
is_there_a_good_market(df)

从数据科学家视角深度解读输出：
模拟结果显示，积极响应（4-5分）的比例为38.00%，中性响应（3分）为24.00%，而消极响应（1-2分）为38.00%。平均分为3.02，标准差为1.39。根据我们设定的is_there_a_good_market函数（积极 > 10% 且消极 < 50%），初步结论是：瓶装西班牙冷汤在美国市场存在机会。然而，这个结论完全基于模拟数据。它的可信度如何？我们能否相信这38%的积极响应在真实世界中同样存在？这正是下一个、也是最关键的步骤所要回答的问题。

2.2.6. 引入实证数据进行交叉验证

代码意图解释：
这是整个案例的华彩乐章。我们首次引入了外部的、真实的调研数据作为“地面实况”（Ground Truth）。

加载控制组数据（Control Data）: 使用SimulationExperimentEmpiricalValidator.read_empirical_data_from_csv方法，从一个CSV文件中加载了真实的问卷调研结果。这个CSV文件包含了真实的受访者ID、投票（1-5分）、解释以及其他人口统计学标签。这个真实数据集将作为我们评估模拟准确性的黄金标准，即“控制组”。
加载实验组数据（Treatment Data）: 相应地，我们将刚刚通过模拟生成的df（包含了Agent的ID、评分和理由）也使用read_empirical_data_from_dataframe方法进行格式化。这个模拟数据集就是我们的“实验组”。
通过这两个步骤，TinyTroupe将来源不同、结构各异的两个数据集，对齐成了统一的、可直接进行统计学比较的SimulationExperimentDataset格式。

粘贴完整代码：

control_data = SimulationExperimentEmpiricalValidator.read_empirical_data_from_csv(file_path="../data/empirical/07.19.2025 - Market Research - Bottled Gazpacho - Raw Data.csv",experimental_data_type="single_value_per_agent",agent_id_column="Responder #",value_column="Vote",agent_comments_column="Explanation",agent_attributes_columns=["Age Range", "Gender Identity", "Political Affiliation", "Racial Or Ethnic Identity"],dataset_name="Test Gazpacho Survey")df_for_validation = df.copy()
df_for_validation["Vote"] = pd.to_numeric(df_for_validation["response"], errors='coerce')treatment_data = SimulationExperimentEmpiricalValidator.read_empirical_data_from_dataframe(df=df_for_validation,experimental_data_type="single_value_per_agent",agent_id_column="name",value_column="Vote",agent_comments_column="justification",dataset_name="Bottled Gazpacho Simulation Results"
)

2.2.7. 执行统计检验并解读结果

代码意图解释：
validate_simulation_experiment_empirically是TinyTroupe的终极验证工具。它接收控制组和实验组的数据，并执行一系列预设的验证流程。

T-test (双样本t检验): 它被用来检验两个独立样本的均值是否存在显著差异。在这里，就是检验“模拟人群的平均购买意愿”与“真实人群的平均购买意愿”是否在统计学上存在显著不同。
KS-test (柯尔莫哥洛夫-斯米尔诺夫检验): 它被用来检验两个样本是否来自同一个分布。相比于只关注均值的T-test，KS-test能更全面地评估两个分布的整体形状相似度。

核心函数深度剖析: validate_simulation_experiment_empirically

功能与作用: 这是TinyTroupe框架中用于量化模拟保真度的核心函数。它通过标准的统计学方法，为“模拟是否真实”这一问题提供了一个客观、可量化的答案，将仿真科学从定性评估带入定量验证的阶段。
输入参数:
- control_data: 黄金标准，真实世界的数据。
- treatment_data: 待验证的模拟数据。
- validation_types: 指定要执行的验证类型，此处为"statistical"。
- statistical_test_type: 指定具体的统计检验方法，如"ttest"或"ks_test"。
结果解读: 函数返回一个包含详细统计结果的对象。其中最重要的指标是p-value。在假设检验中，p-value表示观察到的结果或更极端结果出现的概率。通常，我们设定一个显著性水平（如0.05）。
- 如果 p-value > 0.05: 我们不能拒绝“两个样本来自同一分布”的原假设。通俗地讲，这意味着从统计学上看，模拟数据和真实数据之间没有显著差异。这是一个非常好的结果，它为我们的模拟提供了强有力的证据，证明其高度模拟了现实。
- 如果 p-value < 0.05: 我们可以拒绝原假设，意味着模拟数据和真实数据之间存在显著差异。这提示我们需要回头检查和调整我们的模拟设置（如Agent生成prompt、交互规则等）。

粘贴完整代码：

result_ttest = validate_simulation_experiment_empirically(control_data=control_data,treatment_data=treatment_data,validation_types=["statistical"],output_format="values")result_ks = validate_simulation_experiment_empirically(control_data=control_data,treatment_data=treatment_data,validation_types=["statistical"],statistical_test_type="ks_test",output_format="values")

从数据科学家视角深度解读输出：

T-test 结果: pvalue=0.81。这个值远大于0.05，强烈表明模拟Agent群体的平均购买意愿（3.02）与真实受访者的平均购买意愿在统计上没有显著差异。
KS-test 结果: pvalue=0.99。这个值极其接近1，是比T-test更强的证据，表明两个群体的购买意愿整体分布（而不仅仅是均值）高度一致。

最后，通过并列条形图，我们可以直观地看到，模拟数据（蓝色）和真实数据（红色）在各个评分项上的分布惊人地相似。

粘贴完整代码：

# compare charts. Put both bars in the same plot, for easier comparison
# Need to order the labels in the chart.
fig, ax = plt.subplots()
df["response"].value_counts().reindex(["1", "2", "3", "4", "5", "N/A"]).plot(kind='bar', color='blue', position=0, width=0.4, label="Treatment (simulation)", ax=ax)
pd.DataFrame(control_data.key_results["Vote"]).value_counts().sort_index().plot(kind='bar', color='red', position=1, width=0.4, label="Control", ax=ax)
plt.legend()
plt.show()

最终结论：我们不仅通过模拟得出了“市场存在机会”的结论，更通过与实证数据的交叉验证，为这个结论提供了极高的置信度。我们可以自信地向决策者报告：我们的AI Agent仿真，在这次测试中，高度复现了真实世界的目标市场。

2.3. 完整代码示例

import json
import sys
import pandas as pd
import matplotlib.pyplot as pltsys.path.insert(0, '..')import tinytroupe
from tinytroupe import config_manager
from tinytroupe.agent import TinyPerson
from tinytroupe.environment import TinyWorld
from tinytroupe.factory import TinyPersonFactory
from tinytroupe.validation import TinyPersonValidator
from tinytroupe import controlfrom tinytroupe.extraction import ResultsExtractor
from tinytroupe.profiling import Profilerfrom tinytroupe.validation import SimulationExperimentEmpiricalValidator, SimulationExperimentDataset, validate_simulation_experiment_empiricallycontrol.begin("bottled_gazpacho_market_research_4.cache.json")target_nationality = "American"
population_size = 50additional_demographic_specification = f"""
BESIDES other dimensions infered from the population demographic data (i.e., make sure those are still present), 
ensure these ADDITIONAL sampling dimensions are present, which must be as realistic as possible:- general attitude: from positive to negative, from optimistic to pessimistic, from open-minded to closed-minded- education: from the completely illiterate (can't even talk properly) to the highest scientist- culinary tastes: from traditional to modern, from spicy to mild, from vegetarian to meat-based- shopping habits: from frequent to occasional, from online to in-store, from frugal to extravagant- health consciousness: from health-focused to indulgent, from organic to conventional- attitude towards new products: from open-minded to skeptical, from adventurous to conservative- cultural influences: from local to global, from traditional to contemporary, from diverse to homogeneous- travel habits: from frequent travelers to homebodies, from local to international, from adventurous to cautious- cultural background: from diverse to homogeneous, from traditional to modern, from local to global- lifestyle: from active to sedentary, from urban to rural, from minimalist to extravagantEach of these additional dimensions MUST have AT LEAST 10 values. 
For all of these, YOU MUST provide long, detailed sentences that describe the values in each dimension. NOT short
words. We need EACH value to be VERY detailed.Make sure you also include EXTREME values so that we can properly capture even edge cases. We want a WIDE
range of different people and tastes!
"""interviewer_introduction =\"""We are performing some market research and need to know you more. Can you please present yourself and also list your top-10 interests?"""interviewer_main_question =\""" Gazpacho is a cold, blended vegetable soup originally from Spain, made mainly with tomatoes, cucumbers, peppers, and olive oil. We are considering offering it in supermarkets near you.Would you consider purchasing ready-to-drink bottled gazpacho if it was available at your local supermarket? How much do you like this idea? Please rate your propensity of purchasing it (from 1 to 5), where:- 1: would NEVER buy it. Note that IT IS OK to give this extreme and impulsive answer if it is how you feel, as it is part of the human experience.- 2: very unlikely, but not impossible.- 3: maybe I would buy it, not sure.- 4: it is very likely.- 5: would CERTAINLY buy it. Note that IT IS OK to give this extreme and impulsive answer if it is how you feel, as it is part of the human experience."""inner_monologue =\"""I will be honest as I understand they are not here to judge me, but just to learn from me. Such choices depend on many factors, but I will make my best guess, considering my current situation in life, location, job and interests. I will not refrain from giving extreme answers, such as 1 or 5, if that's how I really feel, as this exercise requires me to be honest, human and realistic.Now I **must** first THINK deeply about the question, consider all the factors that might influence my decision,and only then I will TALK with my response to the question as best, as detailed and as honestly as I can."""results_extractor = ResultsExtractor(extraction_objective="Find whether the person would buy the product. A person rate his/her propensity from 1 (would NEVER buy) to 5 (would CERTAINLY buy it)." ,situation="Agent was asked to rate their interest in a bottled Gazpacho. They can respond with a propensity score from 1 (would NEVER buy) to 5 (would CERTAINLY buy it).", fields=["name", "response", "justification"],fields_hints={"response": "Must be a string formatted exactly as '1', '2', '3', '4', '5' or 'N/A'(if there is no response or you cannot determine the precise response)."},verbose=True)def is_there_a_good_market(df, positive_threshold=0.1, negative_threshold=0.5):df_copy = df.copy()df_copy["response"] = df_copy["response"].astype(str)counts = df_copy["response"].value_counts()total = counts.sum()percentage = counts / totalpercentage_positive = percentage.get("4", 0) + percentage.get("5", 0)percentage_neutral = percentage.get("3", 0)percentage_negative = percentage.get("1", 0) + percentage.get("2", 0)percentage_na = percentage.get("N/A", 0)print(f"Percentage of positive responses (4-5): {percentage_positive:.2%}")print(f"Percentage of neutral responses (3): {percentage_neutral:.2%}")print(f"Percentage of negative responses (1-2): {percentage_negative:.2%}")print(f"Percentage of 'N/A' responses: {percentage_na:.2%}")df_copy["response"] = pd.to_numeric(df_copy["response"], errors='coerce')mean_response = df_copy["response"].mean()std_response = df_copy["response"].std()print(f"Mean response: {mean_response:.2f}")print(f"Standard deviation of responses: {std_response:.2f}")if percentage_positive > positive_threshold and percentage_negative < negative_threshold:print("VERDICT: There is a good market for bottled gazpacho.")return Trueelse:print("VERDICT: There is not a good market for bottled gazpacho.")return Falsefactory = TinyPersonFactory.create_factory_from_demography("./information/populations/usa.json", population_size=population_size, additional_demographic_specification=additional_demographic_specification)
people = factory.generate_people(population_size, verbose=True)profiler = Profiler()
profiler.profile(people)control.checkpoint()market = TinyWorld(f"Target audience ({target_nationality})", people, broadcast_if_no_target=False)
market.broadcast(interviewer_introduction)
market.broadcast(interviewer_main_question)
market.broadcast_thought(inner_monologue)
market.run(1)control.checkpoint()results = results_extractor.extract_results_from_agents(people)
filtered_results = [item for item in results if isinstance(item, dict)]
df = pd.DataFrame(filtered_results)is_there_a_good_market(df)control_data = SimulationExperimentEmpiricalValidator.read_empirical_data_from_csv(file_path="../data/empirical/07.19.2025 - Market Research - Bottled Gazpacho - Raw Data.csv",experimental_data_type="single_value_per_agent",agent_id_column="Responder #",value_column="Vote",agent_comments_column="Explanation",agent_attributes_columns=["Age Range", "Gender Identity", "Political Affiliation", "Racial Or Ethnic Identity"],dataset_name="Test Gazpacho Survey")df_for_validation = df.copy()
df_for_validation["Vote"] = pd.to_numeric(df_for_validation["response"], errors='coerce')treatment_data = SimulationExperimentEmpiricalValidator.read_empirical_data_from_dataframe(df=df_for_validation,experimental_data_type="single_value_per_agent",agent_id_column="name",value_column="Vote",agent_comments_column="justification",dataset_name="Bottled Gazpacho Simulation Results"
)result_ttest = validate_simulation_experiment_empirically(control_data=control_data,treatment_data=treatment_data,validation_types=["statistical"],output_format="values")
print(result_ttest.statistical_results)result_ks = validate_simulation_experiment_empirically(control_data=control_data,treatment_data=treatment_data,validation_types=["statistical"],statistical_test_type="ks_test",output_format="values")
print(result_ks.statistical_results)fig, ax = plt.subplots()
df["response"].value_counts().reindex(["1", "2", "3", "4", "5", "N/A"]).plot(kind='bar', color='blue', position=0, width=0.4, label="Treatment (simulation)", ax=ax)
pd.DataFrame(control_data.key_results["Vote"]).value_counts().sort_index().plot(kind='bar', color='red', position=1, width=0.4, label="Control", ax=ax)
plt.legend()
plt.show()control.end()

3. 场景应用与技术展望

TinyTroupe所展示的“模拟-验证”闭环方法论，在中国复杂的、快速迭代的商业环境中具有巨大的应用价值。

出海品牌本土化测试: 对于希望进入中国市场的国际品牌（如本案例的西班牙冷汤），可以先通过小规模、低成本的线上调研获取一批中国消费者的基准数据。然后，利用TinyTroupe并结合中国的公开人口统计数据，构建一个经过验证的、高保真的中国市场“数字孪生”。在这个虚拟环境中，品牌可以安全、高效地测试不同的产品定位、口味调整、包装设计和营销话语，找到最能打动中国消费者的方案，从而大幅降低市场进入的失败风险。
下沉市场用户心智模拟: 中国的城乡及区域差异巨大，“五环内”和“下沉市场”的用户心智模型截然不同。传统调研难以高效覆盖广袤的下沉市场。利用TinyTroupe，企业可以根据不同省市县的人口特征、经济水平和文化背景，构建多个经过验证的、区域化的虚拟市场。在新产品发布前，可以模拟产品在不同区域市场的传播路径、口碑演变和购买转化，精准预测哪些区域会成为爆点，哪些区域需要定制化的营销策略。

4 总结

TinyTroupe V4版本所开启的实证验证功能，标志着AI Agent群体仿真技术正从一个“启发式工具”向一个“预测性科学仪器”转变。未来的发展将更加激动人心：

自适应的仿真模型: 未来的TinyTroupe或许能够实现“自动校准”。当发现模拟结果与基准数据存在显著偏差时（p-value < 0.05），系统可以自动分析偏差来源，并反向调整Agent的生成Prompt或行为模型，进行新一轮模拟，直至模拟结果与现实高度拟合。这将实现从“手动验证”到“自动对齐”的智能化飞跃。
多模态的现实锚点: 目前的验证主要基于问卷评分等结构化数据。未来，验证的“锚点”可以扩展到多模态。例如，将模拟的社交网络讨论文本，与真实社交媒体上的用户评论进行主题分布、情感极性、关键意见领袖（KOL）识别等方面的对比验证，从而在更丰富的维度上确保模拟的真实性。
构建可信的“社会模拟器”: 随着验证维度的不断丰富和校准能力的不断增强，我们距离构建一个能够可靠预测复杂社会经济现象（如消费趋势、舆情演变、技术采纳曲线）的“社会模拟器”又近了一步。这将为商业决策和公共治理提供一个前所未有的、可进行“未来实验”的沙盒。

查看全文

http://www.dtcms.com/a/581092.html