当前位置：首页 > news >正文

用于构建多模态情绪识别与推理（MERR）数据集的自动化工具

news 2025/11/16 18:26:32

🏗️ 管道结构

MER-Factory 项目是一个用于构建多模态情感识别与推理（MERR）数据集的自动化工厂，它能够处理多种类型的多媒体数据，并进行情感分析和推理。以下是该项目的主要功能：

✨ 功能特性

面部动作单元（AU）处理流程：提取面部动作单元（AUs）并转化为描述性自然语言，解析微表情对应的情感特征。
音频分析处理流程：完成音频提取、语音转录及详细的音调分析（涵盖音高、语速、情绪基调等维度）。
视频分析处理流程：生成视频内容与上下文的全景描述，包含场景、人物动作及情感氛围等信息。
图像分析处理流程：为静态图像提供端到端的情感识别，整合视觉描述与情感合成能力。
完整 MER 处理流程：端到端的多模态处理管线，可识别情感峰值时刻，分析视觉、音频、面部全模态数据，并合成整体性的情感推理总结。

🛠️ 先决条件

1. FFmpeg

视频和音频处理需要 FFmpeg。

安装：

macOS：brew install ffmpeg
Ubuntu/Debian：sudo apt update && sudo apt install ffmpeg
Windows：从 ffmpeg.org 下载

验证安装：

ffmpeg -version
ffprobe -version

2. OpenFace

面部作单元提取需要 OpenFace。

安装：

克隆 OpenFace 存储库：

git clone https://github.com/TadasBaltrusaitis/OpenFace.git
cd OpenFace

按照 OpenFace Wiki 中适用于您的平台的安装说明进行作
构建项目并记下可执行文件的路径（通常在FeatureExtractionbuild/bin/FeatureExtraction)

🚀 安装

git clone git@github.com:Lum1104/MER-Factory.git
cd MER-Factoryconda create -n mer-factory python=3.12
conda activate mer-factorypip install -r requirements.txt

配置：

复制示例环境文件：
```
cp .env.example .env
```
编辑文件并配置您的设置：.env
- GOOGLE_API_KEY：Gemini 型号的 Google API 密钥（如果使用其他型号，则为可选）
- OPENAI_API_KEY：用于 ChatGPT 模型的 OpenAI API 密钥（如果使用其他模型，则为可选）
- OPENFACE_EXECUTABLE：OpenFace FeatureExtraction 可执行文件的路径（AU 和 MER 管道需要）

⚙️ 用法

基本命令结构

python main.py [INPUT_PATH] [OUTPUT_DIR] [OPTIONS]

例子

# Full MER pipeline with Gemini (default)
python main.py path_to_video/ output/ --type MER --silent --threshold 0.8# MER pipeline with custom threshold
python main.py path_to_video/ output/ --type MER --silent --threshold 0.45# Using ChatGPT models
python main.py path_to_video/ output/ --type MER --chatgpt-model gpt-4o --silent# Using local Ollama models
python main.py path_to_video/ output/ --type MER --ollama-vision-model llava-llama3:latest --ollama-text-model llama3.2 --silent# Using Hugging Face model
python main.py path_to_video/ output/ --type MER --huggingface-model google/gemma-3n-E4B-it --silent# Process images instead of videos
python main.py ./images ./output --type MER

注意：如果需要 Ollama 模型，请运行 etc。Ollama 目前不支持视频分析。ollama pull llama3.2

命令行选项

Option	Short	Description	Default
`--type`	`-t`	Processing type (AU, audio, video, image, MER)	MER
`--label-file`	`-l`	Path to a CSV file with 'name' and 'label' columns. Optional, for ground truth labels.	None
`--threshold`	`-th`	Emotion detection threshold (0.0-5.0)	0.8
`--peak_dis`	`-pd`	Steps between peak frame detection (min 8)	15
`--silent`	`-s`	Run with minimal output	False
`--concurrency`	`-c`	Concurrent files for async processing (min 1)	4
`--ollama-vision-model`	`-ovm`	Ollama vision model name	None
`--ollama-text-model`	`-otm`	Ollama text model name	None
`--chatgpt-model`	`-cgm`	ChatGPT model name (e.g., gpt-4o)	None
`--huggingface-model`	`-hfm`	Hugging Face model ID	None

1.作单元（AU）提取

提取面部作单元并生成自然语言描述：

python main.py video.mp4 output/ --type AU

2. 音频分析

提取音频、转录语音并分析语气：

python main.py video.mp4 output/ --type audio

3. 视频分析

生成全面的视频内容描述：

python main.py video.mp4 output/ --type video

4. 图像分析

使用图像输入运行管道：

python main.py ./images ./output --type image
# Note: Image files will automatically use image pipeline regardless of --type setting

5. 完整 MER 管道（默认）

运行完整的多模式情感识别管道：

python main.py video.mp4 output/ --type MER
# or simply:
python main.py video.mp4 output/

🤖 模型支持

该工具支持四种类型的模型：

Google Gemini（默认）：需要输入GOOGLE_API_KEY.env
OpenAI ChatGPT：需要在中，用OPENAI_API_KEY.env--chatgpt-model
Ollama：本地模型，使用和指定--ollama-vision-model--ollama-text-model
Hugging Face：目前支持多模态模型，例如google/gemma-3n-E4B-it

注：如果使用 Hugging Face 模型，则并发性会自动设置为 1 以进行同步处理。

🎯 模型建议

何时使用 Ollama

建议用于：图像分析、作单元分析、文本处理和简单的音频转录任务。

优势：

✅ 异步支持：Ollama 支持异步调用，非常适合高效处理大型数据集
✅ 本地处理：无 API 费用或速率限制
✅ 广泛的模型选择：访问 ollama.com 以探索可用的模型
✅ 隐私：所有处理均在本地进行

用法示例：

# Process images with Ollama
python main.py ./images ./output --type image --ollama-vision-model llava-llama3:latest --ollama-text-model llama3.2 --silent# AU extraction with Ollama
python main.py video.mp4 output/ --type AU --ollama-text-model llama3.2 --silent

何时使用 ChatGPT/Gemini

推荐用于：高级视频分析、复杂的多模态推理和高质量内容生成。

优势：

✅ 最先进的性能：最新的 GPT-4o 和 Gemini 模型提供卓越的推理能力
✅ 高级视频理解：更好地支持复杂的视频分析和时间推理
✅ 高质量输出：更细致、更详细的情感识别和推理
✅ 强大的多模态集成：跨文本、图像和视频模态的出色性能

用法示例：

python main.py video.mp4 output/ --type MER --chatgpt-model gpt-4o --silentpython main.py video.mp4 output/ --type MER --silent

权衡：API 成本和速率限制，但通常为复杂的情绪推理任务提供最高质量的结果。

何时使用 Hugging Face 模型

推荐用于：当您需要 Ollama 中没有的最新先进模型或特定功能时。

自定义模型集成：如果您想使用 Ollama 不支持的最新 HF 型号或功能：

选项 1 - 自行实施：导航到以注册您自己的模型并按照我们现有的模式实现所需的功能。agents/models/hf_models/__init__.py
选项 2 - 请求支持：在我们的仓库上打开一个问题，让我们知道您希望我们支持哪个模型，我们将考虑添加它。

当前支持的型号：以及 HF 型号目录中列出的其他型号。google/gemma-3n-E4B-it

✅ 测试和故障排除

安装验证

使用这些脚本可确保正确配置依赖项。

单击以查看测试命令

测试 FFmpeg 集成：

python test_ffmpeg.py your_video.mp4 test_output/

测试 OpenFace 集成：

python test_openface.py your_video.mp4 test_output/

常见问题

未找到 FFmpeg：
- 症状：与或相关。FileNotFoundErrorffmpegffprobe
- 解决方案：确保 FFmpeg 安装正确，并且其位置包含在系统的环境变量中。使用验证。PATHffmpeg -version
未找到 OpenFace 可执行文件：
- 症状：指示找不到可执行文件的错误。FeatureExtraction
- 解决方案：仔细检查文件中的路径。它必须是可执行文件的绝对路径。确保文件具有执行权限（）。OPENFACE_EXECUTABLE.envchmod +x FeatureExtraction
API 密钥错误（Google/OpenAI）：
- 症状：或错误。401 UnauthorizedPermissionDenied
- 解决方案：验证文件中的 API 密钥是否正确，并且没有多余的空格或字符。确保关联账户已启用计费并有足够的配额。.env
未找到 Ollama 模型：
- 症状：提及模型的错误不可用。
- 解决方案：确保您已使用 .ollama pull <model_name>