当前位置：首页 > news >正文

【字节拥抱开源】 UXO 团队开源 USO: 通过解耦与奖励学习实现的统一风格与主题驱动生成

news 2025/9/7 3:46:32

在这里插入图片描述

🔥 新闻

2025.08.28 🔥 发布了USO的演示版，现在就来体验吧！⚡️
2025.08.28 🔥 将fp8模式更新为主要低显存使用支持（请向下滚动）。这是送给消费级GPU用户的礼物。现在峰值显存使用量约为16GB。
2025.08.27 🔥 发布了USO的推理代码和模型。
2025.08.27 🔥 创建了USO的项目主页。
2025.08.27 🔥 发布了USO的技术报告。

📖 引言

现有文献通常将风格驱动与主体驱动生成视为两项独立任务：前者强调风格相似性，而后者则坚持主体一致性，导致二者形成明显对立。我们认为这两个目标可以在统一框架下实现协同，因为它们本质上都涉及"内容"与"风格"的解耦与重组——这正是风格驱动研究中经久不衰的核心命题。为此，我们提出USO框架（风格驱动与主体驱动生成的统一框架）。首先，我们构建了包含内容图像、风格图像及其对应风格化内容图像的大规模三元组数据集；其次，通过风格对齐训练和内容-风格解耦训练这两个互补目标，提出能同时实现风格特征对齐与内容风格解耦的分离式学习方案；最后，引入风格奖励学习范式以进一步提升模型表现。

⚡️ 快速开始

🔧 需求与安装

安装所需依赖

## create a virtual environment with python >= 3.10 <= 3.12, like
python -m venv uso_env
source uso_env/bin/activate
## or
conda create -n uso_env python=3.10 -y
conda activate uso_env## install torch
## recommended version:
pip install torch==2.4.0 torchvision==0.19.0 --index-url https://download.pytorch.org/whl/cu124 ## then install the requirements by you need
pip install -r requirements.txt # legacy installation command

然后下载检查点：

# 1. set up .env file
cp example.env .env# 2. set your huggingface token in .env (open the file and change this value to your token)
HF_TOKEN=your_huggingface_token_here#3. download the necessary weights (comment any weights you don't need)
pip install huggingface_hub
python ./weights/downloader.py

如果你有权重文件，请注释掉./weights/downloader.py中不需要的内容

✍️ 推理

从以下示例开始探索并激发您的创造力。✨

# the first image is a content reference, and the rest are style references.# for subject-driven generation
python inference.py --prompt "The man in flower shops carefully match bouquets, conveying beautiful emotions and blessings with flowers. " --image_paths "assets/gradio_examples/identity1.jpg" --width 1024 --height 1024
# for style-driven generation
# please keep the first image path empty
python inference.py --prompt "A cat sleeping on a chair." --image_paths "" "assets/gradio_examples/style1.webp" --width 1024 --height 1024
# for style-subject driven generation (or set the prompt to empty for layout-preserved generation)
python inference.py --prompt "The woman gave an impassioned speech on the podium." --image_paths "assets/gradio_examples/identity2.webp" "assets/gradio_examples/style2.webp" --width 1024 --height 1024
# for multi-style generation
# please keep the first image path empty
python inference.py --prompt "A handsome man." --image_paths "" "assets/gradio_examples/style3.webp" "assets/gradio_examples/style4.webp" --width 1024 --height 1024# for low vram:
python inference.py --prompt "your propmt" --image_paths "your_image.jpg" --width 1024 --height 1024 --offload --model_type flux-dev-fp8

您还可以将您的结果与assets/gradio_examples文件夹中的结果进行比较。
更多示例，请访问我们的项目页面或尝试在线demo。

🌟 Gradio演示

python app.py

为降低显存占用，请传递 --offload 和 --name flux-dev-fp8 参数。峰值显存占用将在16GB（单参考）至18GB（多参考）之间。

# please use FLUX_DEV_FP8 replace FLUX_DEV
export FLUX_DEV_FP8="YOUR_FLUX_DEV_PATH"# FLUX_DEV_FP8=/path/to/flux/FLUX.1-dev/flux1-dev.safetensors 即可python app.py --offload --name flux-dev-fp8