当前位置：首页 > news >正文

智谱AI开源CogView4，支持中英文，性能比肩flux!

news 2025/10/14 3:44:18

背景与定位

CogView4 是由智谱AI （清华大学 KEG 实验室孵化企业）推出的开源文生图模型，主打中英双语支持和高质量图像生成，尤其在中文文字生成能力上具有突破性。
在这里插入图片描述

核心定位

全球首个支持生成汉字的开源文生图模型，填补了中文场景的空白。
遵循 Apache 2.0 开源协议，允许商业使用，降低企业接入门槛9。

发布时间

2025年3月4日，团队在 GitHub 开源了基于 Diffusers 的版本（参数量 6B/60亿）。

模型架构与技术特点

基础架构

基于 Transformer 架构的扩散模型，通过迭代去噪生成图像。
优化参数规模（6B）和训练数据，提升生成效率与质量。
在这里插入图片描述

核心能力

中英双语支持：
原生支持中文提示词输入，避免翻译导致的语义偏差46。
文字生成：
能在图像中自然融入汉字（如广告标语、书籍封面文字），文字与背景风格高度协调79。
分辨率灵活性：
支持生成 512px 至 2048px 的宽高范围，适配多种应用场景810。
训练数据：
使用高质量合成图像描述数据集，覆盖多样化的图像内容和风格。

性能与优势

在这里插入图片描述

生成质量

在 DPG-Bench 等基准测试中表现优异，尤其擅长中文场景（如生成带有书法、标语的图像）。
用户反馈显示，生成文字的清晰度和自然度接近真实图像。

开源生态

提供 Diffusers 版本，兼容 Hugging Face 生态工具链，降低开发门槛。
社区活跃，开发者可快速部署并参与迭代。

应用场景

创意设计：生成海报、插画、社交媒体配图等。
广告营销：快速制作含品牌标语的视觉内容。
教育领域：生成带文字说明的教材插图或科普图像。

部署与使用

推理要求和模型介绍

分辨率: 长宽均需满足 512px - 2048px 之间，需被32整除, 并保证最大像素数不超过 2^21 px。
精度: BF16 / FP32 (不支持FP16，会出现溢出导致纯黑图片)
使用 BF16 精度, batchsize=4 进行测试，显存占用如下表所示：
在这里插入图片描述

使用示例


from diffusers import CogView4Pipeline
from modelscope import snapshot_download
import torch

model_dir = snapshot_download("ZhipuAI/CogView4-6B")
pipe = CogView4Pipeline.from_pretrained(model_dir, torch_dtype=torch.bfloat16)

# Open it for reduce GPU memory usage
pipe.enable_model_cpu_offload()
pipe.vae.enable_slicing()
pipe.vae.enable_tiling()

prompt = "A vibrant cherry red sports car sits proudly under the gleaming sun, its polished exterior smooth and flawless, casting a mirror-like reflection. The car features a low, aerodynamic body, angular headlights that gaze forward like predatory eyes, and a set of black, high-gloss racing rims that contrast starkly with the red. A subtle hint of chrome embellishes the grille and exhaust, while the tinted windows suggest a luxurious and private interior. The scene conveys a sense of speed and elegance, the car appearing as if it's about to burst into a sprint along a coastal road, with the ocean's azure waves crashing in the background."
image = pipe(
    prompt=prompt,
    guidance_scale=3.5,
    num_images_per_prompt=1,
    num_inference_steps=50,
    width=1024,
    height=1024,
).images[0]

image.save("cogview4.png")

支持直接输入中文提示词，生成含文字的图像。
在这里插入图片描述