当前位置：首页 > news >正文

每天学习一个统计检验方法--曼-惠特尼U检验（以噩梦障碍中的心跳诱发电位研究为例）

news 2025/9/21 7:25:21

该方法来自这篇文章 “The heartbeat evoked potential is a questionable biomarker in nightmare disorder: A replication study” (Bogdány 等, 2022, p. 1)

想象一下，你是一位科学家，正在研究那些备受噩梦困扰的人。你心里有个问题：“经常做噩梦的人（我们称他们为NM组）和睡眠香甜的对照组（CTL组），他们的睡眠结构真的有区别吗？”

为了回答这个问题，你收集了很多数据，比如总睡眠时间、进入梦境的快速眼动（REM）睡眠时长等等。现在，你手上握着两组数据，要怎么比较它们呢？

统计工具箱里的“明星选手”与“特种兵”

在统计学的工具箱里，最耀眼的明星可能就是t检验（t-test）了。它的作用就是比较两组数据的平均值是否存在显著差异。比如，比较男性和女性的身高，或者比较吃药和没吃药两组病人的血压。

但是，t检验有点“偶像包袱”，它要求你的数据最好能服从正态分布。你可以把正态分布想象成一个完美的钟形曲线，中间高，两边低，左右对称。很多生理指标，比如身高、体重，都大致符合这个规律。

然而，现实世界的数据并不总是那么“乖巧”。如果我们想比较的两组数据，它们的分布歪歪扭扭，根本不是对称的钟形，那该怎么办呢？强行使用t检验，就像让一个只擅长在平地上快跑的选手去参加障碍赛，结果可能会非常不准确。

这时，我们就需要请出统计工具箱里的“特种兵”——非参数检验（non-parametric test）。而曼-惠特尼U检验就是其中最著名的一员。

曼-惠特尼U检验专门解决t检验搞不定的“疑难杂症”。它的出场时机主要有两个：

数据不服从正态分布：当你的数据分布看起来很“偏”，或者你压根不确定它是什么分布时，U检验就是最安全的选择。
数据是等级/序数数据：有时候我们的数据本身就不是精确的数值，而是等级，比如“非常满意、满意、一般、不满意”，或者比赛的排名“第一、第二、第三”。对于这类数据，U检验简直是量身定做。

总而言之，当你想要比较两个独立的组，但又不能满足t检验对数据分布的要求时，曼-惠特尼U检验就是你的最佳拍档。

U检验的“独门秘籍”：不看数值，看排位！

那么，这个检验到底是怎么工作的呢？它的核心思想非常巧妙和直观：它不关心具体的数值有多大，只关心这些数值在混合排序中的位置。

让我们通过一个简化的噩梦研究例子来理解它。假设我们招募了4名噩梦患者（NM组）和5名健康对照者（CTL组），记录了他们昨晚的REM睡眠时长（分钟）。

组别	REM睡眠时长（分钟）
NM	80, 85, 110, 125
CTL	90, 105, 115, 130, 140

第一步：忘掉分组，天下大同

把两组数据“扔”进一个大池子里，假装它们是一家人。然后，从低到高给所有数据排个名次（Rank）。

数值	80	85	90	105	110	115	125	130	140
排名	1	2	3	4	5	6	7	8	9

第二步：各回各家，计算排名和

现在，我们把排名“物归原主”，看看每个组的成员都获得了哪些名次，并计算每个组的“排名总和”（Sum of Ranks）。

NM组的排名是：1, 2, 5, 7
- 排名和 RNM = 1 + 2 + 5 + 7 = 15
CTL组的排名是：3, 4, 6, 8, 9
- 排名和 RCTL = 3 + 4 + 6 + 8 + 9 = 30

第三步：U检验的逻辑核心

请在这里停一下，感受一下。如果两个组真的没有差别，那么它们的排名应该是随机地、均匀地混合在一起的。NM组和CTL组的排名和也应该会比较接近。

反之，如果一个组的排名和特别小（说明他们的数值普遍偏小，排名靠前），或者特别大（说明他们的数值普遍偏大，排名靠后），我们就得怀疑：这两个组，恐怕真的有区别！

曼-惠特尼U检验就是把这个“感觉”量化了。它会计算一个叫做U统计量的东西，这个U值本质上衡量的是：从A组里随便抽一个成员，他的排名比B组成员高的次数。我们不需要手动去数，它有固定的公式来计算，而这个公式就是基于我们刚刚算出的排名和。

最后，统计软件会根据U值和样本量，计算出一个p值。这个p值告诉我们，如果两组真的没区别，我们有多大的概率会“碰巧”观察到当前这样（或更极端）的排名差异。通常，如果p值小于0.05，我们就认为这种差异不太可能是巧合，从而得出结论：这两个组之间存在显著差异。

实战演练：解读噩梦论文

现在，让我们回到你提供的那篇论文。请看论文中的 Table 2。

研究人员比较了NM组和CTL组在多项睡眠指标上的差异。我们来看Study 1中的几个例子：

Total time in bed (min)：NM组和CTL组的这项指标，研究者用了 Mann-Whitney U test 来比较，得到的 p value 是 0.08。因为0.08大于0.05，所以我们认为两组在“躺在床上的总时间”上没有统计学上的显著差异。
Sleep Efficiency (%)：这项指标的p值是 0.02。因为0.02小于0.05，这说明NM组的睡眠效率显著低于CTL组，这个差异不太可能是偶然发生的。
REM (%)：这项指标的p值是 0.01。同样小于0.05，说明NM组的REM睡眠所占比例显著高于CTL组。

这篇论文完美地展示了曼-惠特尼U检验的实际应用。当研究者可能不确定数据是否满足正态分布时，他们选择了一个更稳健、更可靠的“特种兵”来完成比较任务。

总结

希望通过这个噩梦研究的旅程，你对曼-惠特尼U检验有了更深刻和生动的理解。记住它的核心：

它是谁？ t检验的强大备选，一种非参数方法。
何时用？ 比较两个独立组，且数据不满足正态分布或为等级数据时。
怎么做？ 核心是“排名”，通过比较两组排名的混合程度来判断是否存在差异。

下一次当你看到数据不那么“完美”时，请不要忘记这位低调而强大的统计学“特种兵”！

相关示例代码

python

import scipy.stats as stats# 这是一个基于文章主题的简化示例数据
# 假设我们记录了噩梦组（NM）和对照组（CTL）的REM睡眠时长（分钟）
#
# NM组: 样本量 n1 = 4
# CTL组: 样本量 n2 = 5
nm_group_rem_sleep = [80, 85, 110, 125]
ctl_group_rem_sleep = [90, 105, 115, 130, 140]# 使用scipy库中的mannwhitneyu函数来执行检验
# alternative='two-sided' 表示我们检验的是双边差异（即，我们不预设哪组更高或更低）
u_statistic, p_value = stats.mannwhitneyu(nm_group_rem_sleep, ctl_group_rem_sleep, alternative='two-sided')print(f"--- 曼-惠特尼U检验结果 ---")
print(f"NM组数据: {nm_group_rem_sleep}")
print(f"CTL组数据: {ctl_group_rem_sleep}")
print(f"--------------------------------")
print(f"U 统计量: {u_statistic}")
print(f"P 值: {p_value:.4f}") # 格式化p值，保留4位小数# 解释结果
# 通常，我们设置一个显著性水平alpha，最常用的是0.05
alpha = 0.05
if p_value < alpha:print("\n结论: P值小于0.05，我们拒绝原假设。")print("可以认为，噩梦组和对照组的REM睡眠时长分布存在显著差异。")
else:print("\n结论: P值大于或等于0.05，我们无法拒绝原假设。")print("没有足够的证据表明，噩梦组和对照组的REM睡眠时长分布有显著差异。")# --- 手动计算排名和来验证U值的含义 ---
# U值的其中一个计算方法是: U = R - n*(n+1)/2，其中R是某一组的排名和
all_data = sorted(nm_group_rem_sleep + ctl_group_rem_sleep)
ranks = {val: i+1 for i, val in enumerate(all_data)}# 计算NM组的排名和
rank_sum_nm = sum(ranks[val] for val in nm_group_rem_sleep)
print(f"\n--- 手动验证 ---")
print(f"所有数据的排名: {ranks}")
print(f"NM组的排名和 (R_nm): {rank_sum_nm}")n1 = len(nm_group_rem_sleep)
n2 = len(ctl_group_rem_sleep)# 计算基于NM组的U1
u1 = rank_sum_nm - (n1 * (n1 + 1)) / 2
# 计算基于CTL组的U2
u2 = n1 * n2 - u1# scipy返回的是U1和U2中较小的那一个
print(f"计算出的U1值: {u1}")
print(f"计算出的U2值: {u2}")
print(f"Scipy返回的U统计量 (min(U1, U2)): {min(u1, u2)}")
print("可以看到，Scipy的计算结果与我们手动验证的逻辑是一致的。")

R语言

import scipy.stats as stats# 这是一个基于文章主题的简化示例数据
# 假设我们记录了噩梦组（NM）和对照组（CTL）的REM睡眠时长（分钟）
#
# NM组: 样本量 n1 = 4
# CTL组: 样本量 n2 = 5
nm_group_rem_sleep = [80, 85, 110, 125]
ctl_group_rem_sleep = [90, 105, 115, 130, 140]# 使用scipy库中的mannwhitneyu函数来执行检验
# alternative='two-sided' 表示我们检验的是双边差异（即，我们不预设哪组更高或更低）
u_statistic, p_value = stats.mannwhitneyu(nm_group_rem_sleep, ctl_group_rem_sleep, alternative='two-sided')print(f"--- 曼-惠特尼U检验结果 ---")
print(f"NM组数据: {nm_group_rem_sleep}")
print(f"CTL组数据: {ctl_group_rem_sleep}")
print(f"--------------------------------")
print(f"U 统计量: {u_statistic}")
print(f"P 值: {p_value:.4f}") # 格式化p值，保留4位小数# 解释结果
# 通常，我们设置一个显著性水平alpha，最常用的是0.05
alpha = 0.05
if p_value < alpha:print("\n结论: P值小于0.05，我们拒绝原假设。")print("可以认为，噩梦组和对照组的REM睡眠时长分布存在显著差异。")
else:print("\n结论: P值大于或等于0.05，我们无法拒绝原假设。")print("没有足够的证据表明，噩梦组和对照组的REM睡眠时长分布有显著差异。")# --- 手动计算排名和来验证U值的含义 ---
# U值的其中一个计算方法是: U = R - n*(n+1)/2，其中R是某一组的排名和
all_data = sorted(nm_group_rem_sleep + ctl_group_rem_sleep)
ranks = {val: i+1 for i, val in enumerate(all_data)}# 计算NM组的排名和
rank_sum_nm = sum(ranks[val] for val in nm_group_rem_sleep)
print(f"\n--- 手动验证 ---")
print(f"所有数据的排名: {ranks}")
print(f"NM组的排名和 (R_nm): {rank_sum_nm}")n1 = len(nm_group_rem_sleep)
n2 = len(ctl_group_rem_sleep)# 计算基于NM组的U1
u1 = rank_sum_nm - (n1 * (n1 + 1)) / 2
# 计算基于CTL组的U2
u2 = n1 * n2 - u1# scipy返回的是U1和U2中较小的那一个
print(f"计算出的U1值: {u1}")
print(f"计算出的U2值: {u2}")
print(f"Scipy返回的U统计量 (min(U1, U2)): {min(u1, u2)}")
print("可以看到，Scipy的计算结果与我们手动验证的逻辑是一致的。")

构建前端学习相关页面

<!DOCTYPE html>
<html lang="zh-CN">
<head><meta charset="UTF-8"><meta name="viewport" content="width=device-width, initial-scale=1.0"><title>交互式曼-惠特尼U检验探索</title><script src="https://cdn.tailwindcss.com"></script><script src="https://cdn.jsdelivr.net/npm/chart.js"></script><link rel="preconnect" href="https://fonts.googleapis.com"><link rel="preconnect" href="https://fonts.gstatic.com" crossorigin><link href="https://fonts.googleapis.com/css2?family=Noto+Sans+SC:wght@400;500;700&display=swap" rel="stylesheet"><!-- Chosen Palette: Warm Neutrals --><!-- Application Structure Plan: A single-view dashboard structure was chosen to provide a clear, immediate overview. It features a control/statistics panel on the left and a primary visualization on the right. This classic layout separates interaction from display, making it intuitive for users to see how their choices (filtering data) immediately affect both the chart and summary numbers. This direct cause-and-effect interaction is ideal for exploring the small dataset and understanding the group differences. --><!-- Visualization & Content Choices: Report Info: REM sleep duration for Nightmare (NM) vs. Control (CTL) groups. Goal: Visually compare distributions and display statistical results. Viz/Presentation: A Bar Chart (Chart.js) was chosen to clearly show individual data points for each group, making the variance within and between groups apparent. An interactive HTML table displays dynamic summary statistics (mean, median), providing quantitative context. An HTML text block presents the pre-calculated Mann-Whitney U test result, the core finding. Interaction: Buttons allow users to filter the groups displayed on the chart and in the stats table. Justification: This combination of interactive filtering, visual charting, and clear statistical readouts allows users to explore the data from multiple angles, supporting the goal of making the report's content consumable and understandable. Library/Method: Chart.js for canvas-based charting, Vanilla JS for all logic and DOM manipulation. --><!-- CONFIRMATION: NO SVG graphics used. NO Mermaid JS used. --><style>body {font-family: 'Noto Sans SC', sans-serif;}.chart-container {position: relative;width: 100%;max-width: 800px;margin-left: auto;margin-right: auto;height: 60vh;max-height: 500px;}</style>
</head>
<body class="bg-stone-50 text-stone-800"><div class="container mx-auto p-4 sm:p-6 lg:p-8"><header class="text-center mb-8"><h1 class="text-3xl sm:text-4xl font-bold text-stone-900">交互式曼-惠特尼U检验探索</h1><p class="mt-2 text-lg text-stone-600">一个用于探索噩梦组(NM)与对照组(CTL)睡眠数据的可视化工具</p></header><div class="bg-white rounded-2xl shadow-lg p-6 mb-8"><h2 class="text-2xl font-semibold mb-3 text-stone-800">研究背景</h2><p class="text-stone-700 leading-relaxed">本应用旨在通过一个简化的案例，直观地展示两组独立样本数据的比较过程。我们使用了“噩梦患者组(NM)”和“健康对照组(CTL)”的快速眼动(REM)睡眠时长（分钟）作为示例数据。通过下方的交互式图表和统计分析，您可以探索两组数据的分布特征，并了解非参数检验（如此处的曼-惠特尼U检验）如何帮助我们判断两组之间是否存在显著差异。</p></div><main class="grid grid-cols-1 lg:grid-cols-3 gap-8"><aside class="lg:col-span-1 space-y-8"><div class="bg-white rounded-2xl shadow-lg p-6"><h3 class="text-xl font-semibold mb-4 border-b pb-3 text-stone-800">数据筛选</h3><div id="controls" class="flex flex-col space-y-3"><button data-group="both" class="control-btn active bg-sky-600 text-white font-semibold py-2 px-4 rounded-lg shadow-md hover:bg-sky-700 transition duration-300">显示全部</button><button data-group="nm" class="control-btn bg-white text-stone-700 font-semibold py-2 px-4 rounded-lg border border-stone-300 hover:bg-stone-100 transition duration-300">仅显示噩梦组 (NM)</button><button data-group="ctl" class="control-btn bg-white text-stone-700 font-semibold py-2 px-4 rounded-lg border border-stone-300 hover:bg-stone-100 transition duration-300">仅显示对照组 (CTL)</button></div></div><div class="bg-white rounded-2xl shadow-lg p-6"><h3 class="text-xl font-semibold mb-4 border-b pb-3 text-stone-800">描述性统计</h3><table class="w-full text-left"><thead><tr class="border-b"><th class="py-2">组别</th><th class="py-2 text-center">样本量</th><th class="py-2 text-center">平均值</th><th class="py-2 text-center">中位数</th></tr></thead><tbody id="stats-table"></tbody></table></div><div class="bg-white rounded-2xl shadow-lg p-6"><h3 class="text-xl font-semibold mb-4 border-b pb-3 text-stone-800">曼-惠特尼U检验</h3><div id="test-results" class="space-y-3"><div class="flex justify-between items-center"><span class="font-medium text-stone-600">U 统计量:</span><span id="u-stat" class="font-bold text-xl text-sky-700">4.0</span></div><div class="flex justify-between items-center"><span class="font-medium text-stone-600">P 值:</span><span id="p-value" class="font-bold text-xl text-sky-700">0.4206</span></div><div id="interpretation" class="mt-4 pt-4 border-t text-center text-stone-700 bg-stone-100 p-3 rounded-lg">P值 > 0.05，结果不显著。我们没有足够证据认为两组的REM睡眠时长分布存在差异。</div></div></div></aside><section class="lg:col-span-2 bg-white rounded-2xl shadow-lg p-6 flex flex-col items-center"><h2 class="text-2xl font-semibold mb-4 text-stone-800">REM睡眠时长分布</h2><div class="chart-container"><canvas id="remSleepChart"></canvas></div></section></main><footer class="text-center mt-12 py-4"><p class="text-stone-500">此应用仅为教学演示目的</p></footer></div><script>document.addEventListener('DOMContentLoaded', () => {const sourceData = {nm: {label: '噩梦组 (NM)',values: [80, 85, 110, 125],backgroundColor: 'rgba(239, 68, 68, 0.6)',borderColor: 'rgba(239, 68, 68, 1)',},ctl: {label: '对照组 (CTL)',values: [90, 105, 115, 130, 140],backgroundColor: 'rgba(59, 130, 246, 0.6)',borderColor: 'rgba(59, 130, 246, 1)',}};let state = {showNM: true,showCTL: true,};const ctx = document.getElementById('remSleepChart').getContext('2d');let remSleepChart;const calculateStats = (arr) => {if (!arr || arr.length === 0) return { n: 0, mean: 'N/A', median: 'N/A' };const n = arr.length;const sum = arr.reduce((a, b) => a + b, 0);const mean = (sum / n).toFixed(1);const sorted = [...arr].sort((a, b) => a - b);const mid = Math.floor(n / 2);const median = n % 2 !== 0 ? sorted[mid] : ((sorted[mid - 1] + sorted[mid]) / 2).toFixed(1);return { n, mean, median };};const updateStatsTable = () => {const tableBody = document.getElementById('stats-table');tableBody.innerHTML = '';const createRow = (groupKey) => {const group = sourceData[groupKey];const stats = calculateStats(group.values);const row = document.createElement('tr');row.innerHTML = `<td class="py-2 font-semibold" style="color:${group.borderColor}">${group.label}</td><td class="py-2 text-center">${stats.n}</td><td class="py-2 text-center">${stats.mean}</td><td class="py-2 text-center">${stats.median}</td>`;return row;};if (state.showNM) {tableBody.appendChild(createRow('nm'));}if (state.showCTL) {tableBody.appendChild(createRow('ctl'));}if (!state.showNM && !state.showCTL){tableBody.innerHTML = `<tr><td colspan="4" class="text-center py-4 text-stone-500">请选择一个要显示的数据组。</td></tr>`;}};const initChart = () => {const datasets = [];const labels = [];if (state.showNM) {sourceData.nm.values.forEach((val, i) => {labels.push(`NM ${i+1}`);datasets.push({label: `NM ${i+1}`,data: [val],backgroundColor: sourceData.nm.backgroundColor,borderColor: sourceData.nm.borderColor,borderWidth: 2});});}if (state.showCTL) {sourceData.ctl.values.forEach((val, i) => {labels.push(`CTL ${i+1}`);datasets.push({label: `CTL ${i+1}`,data: [val],backgroundColor: sourceData.ctl.backgroundColor,borderColor: sourceData.ctl.borderColor,borderWidth: 2});});}const combinedData = (state.showNM ? sourceData.nm.values : []).concat(state.showCTL ? sourceData.ctl.values : []);const allLabels = (state.showNM ? sourceData.nm.values.map((_, i) => `NM ${i+1}`) : []).concat(state.showCTL ? sourceData.ctl.values.map((_, i) => `CTL ${i+1}`) : []);const chartData = {labels: allLabels,datasets: [{label: 'REM 睡眠时长 (分钟)',data: combinedData,backgroundColor: allLabels.map(l => l.startsWith('NM') ? sourceData.nm.backgroundColor : sourceData.ctl.backgroundColor),borderColor: allLabels.map(l => l.startsWith('NM') ? sourceData.nm.borderColor : sourceData.ctl.borderColor),borderWidth: 2,borderRadius: 5,}]};remSleepChart = new Chart(ctx, {type: 'bar',data: chartData,options: {responsive: true,maintainAspectRatio: false,indexAxis: 'y',scales: {x: {beginAtZero: true,title: {display: true,text: 'REM 睡眠时长 (分钟)',font: { size: 14 }}},y: {title: {display: true,text: '参与者',font: { size: 14 }}}},plugins: {legend: {display: false},tooltip: {callbacks: {label: function(context) {return `时长: ${context.raw} 分钟`;}}}}}});};const updateChart = () => {const combinedData = (state.showNM ? sourceData.nm.values : []).concat(state.showCTL ? sourceData.ctl.values : []);const allLabels = (state.showNM ? sourceData.nm.values.map((_, i) => `NM ${i+1}`) : []).concat(state.showCTL ? sourceData.ctl.values.map((_, i) => `CTL ${i+1}`) : []);remSleepChart.data.labels = allLabels;remSleepChart.data.datasets[0].data = combinedData;remSleepChart.data.datasets[0].backgroundColor = allLabels.map(l => l.startsWith('NM') ? sourceData.nm.backgroundColor : sourceData.ctl.backgroundColor);remSleepChart.data.datasets[0].borderColor = allLabels.map(l => l.startsWith('NM') ? sourceData.nm.borderColor : sourceData.ctl.borderColor);remSleepChart.update();};const controlButtons = document.querySelectorAll('.control-btn');controlButtons.forEach(button => {button.addEventListener('click', () => {const group = button.dataset.group;controlButtons.forEach(btn => {btn.classList.remove('active', 'bg-sky-600', 'text-white');btn.classList.add('bg-white', 'text-stone-700', 'border', 'border-stone-300');});button.classList.add('active', 'bg-sky-600', 'text-white');button.classList.remove('bg-white', 'text-stone-700', 'border', 'border-stone-300');if (group === 'both') {state.showNM = true;state.showCTL = true;} else if (group === 'nm') {state.showNM = true;state.showCTL = false;} else if (group === 'ctl') {state.showNM = false;state.showCTL = true;}updateChart();updateStatsTable();});});initChart();updateStatsTable();});
</script></body>
</html>

查看全文

http://www.dtcms.com/a/392646.html