当前位置: 首页 > wzjs >正文

老师做家教的网站b2b网站推广排名

老师做家教的网站,b2b网站推广排名,怎么找国外采购商,论网站建设技术的作者是谁我们向您介绍在音频理解、生成和对话方面表现出色的开源音频基础模型–Kimi-Audio。该资源库托管了 Kimi-Audio-7B-Instruct 的模型检查点。 Kimi-Audio 被设计为通用的音频基础模型,能够在单一的统一框架内处理各种音频处理任务。主要功能包括: 通用功…

我们向您介绍在音频理解、生成和对话方面表现出色的开源音频基础模型–Kimi-Audio。该资源库托管了 Kimi-Audio-7B-Instruct 的模型检查点。

在这里插入图片描述

Kimi-Audio 被设计为通用的音频基础模型,能够在单一的统一框架内处理各种音频处理任务。主要功能包括:

  • 通用功能:可处理各种任务,如语音识别 (ASR)、音频问题解答 (AQA)、音频字幕 (AAC)、语音情感识别 (SER)、声音事件/场景分类 (SEC/ASC)、文本到语音 (TTS)、语音转换 (VC) 和端到端语音对话。
  • 最先进的性能:在众多音频基准测试中取得了 SOTA 级结果(参见我们的技术报告)。
  • 大规模预训练:在超过 1300 万小时的各种音频数据(语音、音乐、声音)和文本数据上进行预训练。
  • 新颖的架构:采用混合音频输入(连续声音+离散语义标记)和 LLM 内核,并行头用于文本和音频标记生成。
  • 高效推理:采用基于流匹配的分块流式解码器,可生成低延迟音频。

https://github.com/MoonshotAI/Kimi-Audio

架构概述

在这里插入图片描述
Kimi-Audio consists of three main components:

  1. Audio Tokenizer: 将输入音频转换为:
    使用向量量化的离散语义标记(12.5Hz)。
    从 Whisper 编码器获得的连续声学特征(降采样至 12.5Hz)。
  2. Audio LLM: 基于转换器的模型(从 Qwen 2.5 7B 等预先训练好的文本 LLM 初始化),具有处理多模态输入的共享层,然后是并行头,用于自回归生成文本标记和离散音频语义标记。
  3. Audio Detokenizer: 使用流匹配模型和声码器(BigVGAN)将预测的离散语义音频标记转换成高保真波形,支持采用前瞻机制的分块流,以降低延迟。

评估

Kimi-Audio在广泛的音频基准测试中实现了最先进的(SOTA)性能。

以下是总体表现:

在这里插入图片描述

Automatic Speech Recognition (ASR)

DatasetsModelPerformance (WER↓)
LibriSpeech
test-clean | test-other
Qwen2-Audio-base1.74 | 4.04
Baichuan-base3.02 | 6.04
Step-Audio-chat3.19 | 10.67
Qwen2.5-Omni2.37 | 4.21
Kimi-Audio1.28 | 2.42
Fleurs
zh | en
Qwen2-Audio-base3.63 | 5.20
Baichuan-base4.15 | 8.07
Step-Audio-chat4.26 | 8.56
Qwen2.5-Omni2.92 | 4.17
Kimi-Audio2.69 | 4.44
AISHELL-1Qwen2-Audio-base1.52
Baichuan-base1.93
Step-Audio-chat2.14
Qwen2.5-Omni1.13
Kimi-Audio0.60
AISHELL-2 iosQwen2-Audio-base3.08
Baichuan-base3.87
Step-Audio-chat3.89
Qwen2.5-Omni2.56
Kimi-Audio2.56
WenetSpeech
test-meeting | test-net
Qwen2-Audio-base8.40 | 7.64
Baichuan-base13.28 | 10.13
Step-Audio-chat10.83 | 9.47
Qwen2.5-Omni7.71 | 6.04
Kimi-Audio6.28 | 5.37
Kimi-ASR Internal Testset
subset1 | subset2
Qwen2-Audio-base2.31 | 3.24
Baichuan-base3.41 | 5.60
Step-Audio-chat2.82 | 4.74
Qwen2.5-Omni1.53 | 2.68
Kimi-Audio1.42 | 2.44

Audio Understanding

DatasetsModelPerformance↑
MMAU
music | sound | speech
Qwen2-Audio-base58.98 | 69.07 | 52.55
Baichuan-chat49.10 | 59.46 | 42.47
GLM-4-Voice38.92 | 43.54 | 32.43
Step-Audio-chat49.40 | 53.75 | 47.75
Qwen2.5-Omni62.16 | 67.57 | 53.92
Kimi-Audio61.68 | 73.27 | 60.66
ClothoAQA
test | dev
Qwen2-Audio-base71.73 | 72.63
Baichuan-chat48.02 | 48.16
Step-Audio-chat45.84 | 44.98
Qwen2.5-Omni72.86 | 73.12
Kimi-Audio71.24 | 73.18
VocalSoundQwen2-Audio-base93.82
Baichuan-base58.17
Step-Audio-chat28.58
Qwen2.5-Omni93.73
Kimi-Audio94.85
Nonspeech7kQwen2-Audio-base87.17
Baichuan-chat59.03
Step-Audio-chat21.38
Qwen2.5-Omni69.89
Kimi-Audio93.93
MELDQwen2-Audio-base51.23
Baichuan-chat23.59
Step-Audio-chat33.54
Qwen2.5-Omni49.83
Kimi-Audio59.13
TUT2017Qwen2-Audio-base33.83
Baichuan-base27.9
Step-Audio-chat7.41
Qwen2.5-Omni43.27
Kimi-Audio65.25
CochlScene
test | dev
Qwen2-Audio-base52.69 | 50.96
Baichuan-base34.93 | 34.56
Step-Audio-chat10.06 | 10.42
Qwen2.5-Omni63.82 | 63.82
Kimi-Audio79.84 | 80.99

Audio-to-Text Chat

DatasetsModelPerformance↑
OpenAudioBench
AlpacaEval | Llama Questions |
Reasoning QA | TriviaQA | Web Questions
Qwen2-Audio-chat57.19 | 69.67 | 42.77 | 40.30 | 45.20
Baichuan-chat59.65 | 74.33 | 46.73 | 55.40 | 58.70
GLM-4-Voice57.89 | 76.00 | 47.43 | 51.80 | 55.40
StepAudio-chat56.53 | 72.33 | 60.00 | 56.80 | 73.00
Qwen2.5-Omni72.76 | 75.33 | 63.76 | 57.06 | 62.80
Kimi-Audio75.73 | 79.33 | 58.02 | 62.10 | 70.20
VoiceBench
AlpacaEval | CommonEval |
SD-QA | MMSU
Qwen2-Audio-chat3.69 | 3.40 | 35.35 | 35.43
Baichuan-chat4.00 | 3.39 | 49.64 | 48.80
GLM-4-Voice4.06 | 3.48 | 43.31 | 40.11
StepAudio-chat3.99 | 2.99 | 46.84 | 28.72
Qwen2.5-Omni4.33 | 3.84 | 57.41 | 56.38
Kimi-Audio4.46 | 3.97 | 63.12 | 62.17
VoiceBench
OpenBookQA | IFEval |
AdvBench | Avg
Qwen2-Audio-chat49.01 | 22.57 | 98.85 | 54.72
Baichuan-chat63.30 | 41.32 | 86.73 | 62.51
GLM-4-Voice52.97 | 24.91 | 88.08 | 57.17
StepAudio-chat31.87 | 29.19 | 65.77 | 48.86
Qwen2.5-Omni79.12 | 53.88 | 99.62 | 72.83
Kimi-Audio83.52 | 61.10 | 100.00 | 76.93

Speech Conversation

Performance of Kimi-Audio and baseline models on speech conversation.
ModelAbility
Speed ControlAccent ControlEmotion ControlEmpathyStyle ControlAvg
GPT-4o4.213.654.053.874.544.06
Step-Audio-chat3.252.873.333.054.143.33
GLM-4-Voice3.833.513.773.074.043.65
GPT-4o-mini3.152.714.243.164.013.45
Kimi-Audio4.303.454.273.394.093.90

快速上手

部署环境

git clone https://github.com/MoonshotAI/Kimi-Audio
cd Kimi-Audio
git submodule update --init --recursive
pip install -r requirements.txt

推理代码

import soundfile as sf
# Assuming the KimiAudio class is available after installation
from kimia_infer.api.kimia import KimiAudio
import torch # Ensure torch is imported if needed for device placement# --- 1. Load Model ---
# Load the model from Hugging Face Hub
# Make sure you are logged in (`huggingface-cli login`) if the repo is private.
model_id = "moonshotai/Kimi-Audio-7B-Instruct" # Or "Kimi/Kimi-Audio-7B"
device = "cuda" if torch.cuda.is_available() else "cpu" # Example device placement
# Note: The KimiAudio class might handle model loading differently.
# You might need to pass the model_id directly or download checkpoints manually
# and provide the local path as shown in the original readme_kimia.md.
# Please refer to the main Kimi-Audio repository for precise loading instructions.
# Example assuming KimiAudio takes the HF ID or a local path:
try:model = KimiAudio(model_path=model_id, load_detokenizer=True) # May need device argumentmodel.to(device) # Example device placement
except Exception as e:print(f"Automatic loading from HF Hub might require specific setup.")print(f"Refer to Kimi-Audio docs. Trying local path example (update path!). Error: {e}")# Fallback example:# model_path = "/path/to/your/downloaded/kimia-hf-ckpt" # IMPORTANT: Update this path if loading locally# model = KimiAudio(model_path=model_path, load_detokenizer=True)# model.to(device) # Example device placement# --- 2. Define Sampling Parameters ---
sampling_params = {"audio_temperature": 0.8,"audio_top_k": 10,"text_temperature": 0.0,"text_top_k": 5,"audio_repetition_penalty": 1.0,"audio_repetition_window_size": 64,"text_repetition_penalty": 1.0,"text_repetition_window_size": 16,
}# --- 3. Example 1: Audio-to-Text (ASR) ---
# TODO: Provide actual example audio files or URLs accessible to users
# E.g., download sample files first or use URLs
# wget https://path/to/your/asr_example.wav -O asr_example.wav
# wget https://path/to/your/qa_example.wav -O qa_example.wav
asr_audio_path = "./test_audios/asr_example.wav" # IMPORTANT: Make sure this file exists
qa_audio_path = "./test_audios/qa_example.wav" # IMPORTANT: Make sure this file existsmessages_asr = [{"role": "user", "message_type": "text", "content": "Please transcribe the following audio:"},{"role": "user", "message_type": "audio", "content": asr_audio_path}
]# Generate only text output
# Note: Ensure the model object and generate method accept device placement if needed
_, text_output = model.generate(messages_asr, **sampling_params, output_type="text")
print(">>> ASR Output Text: ", text_output)
# Expected output: "这并不是告别,这是一个篇章的结束,也是新篇章的开始。" (Example)# --- 4. Example 2: Audio-to-Audio/Text Conversation ---
messages_conversation = [{"role": "user", "message_type": "audio", "content": qa_audio_path}
]# Generate both audio and text output
wav_output, text_output = model.generate(messages_conversation, **sampling_params, output_type="both")# Save the generated audio
output_audio_path = "output_audio.wav"
# Ensure wav_output is on CPU and flattened before saving
sf.write(output_audio_path, wav_output.detach().cpu().view(-1).numpy(), 24000) # Assuming 24kHz output
print(f">>> Conversational Output Audio saved to: {output_audio_path}")
print(">>> Conversational Output Text: ", text_output)
# Expected output: "A." (Example)print("Kimi-Audio inference examples complete.")
http://www.dtcms.com/wzjs/66248.html

相关文章:

  • 新疆住房和城乡建设部网站官网百度爱采购优化软件
  • 在哪个网站做失业分解2023引流软件
  • 广东省南粤交通投资建设有限公司网站公司网站建设推广
  • 杭州网站建设公司排名营销型网站的分类不包含
  • 个人网站的基本风格有哪些淘宝排名查询
  • 如何做视频网站流程数据分析网
  • 制作游戏的网站app推广拉新一手渠道代理
  • 帮客户做传销网站排名查询
  • 收录网站源码房地产十大营销手段
  • wordpress 文章标题查询福州百度seo排名
  • 上海做网站哪家好百度知道官网首页登录入口
  • 免费咨询贷款网站seo优化方案设计
  • 做bannar在哪个网站参考在线识图
  • 网站网络营销外包北京网站seo优化推广
  • 网站建设代理国际新闻头条今日要闻
  • 网站制作的设备环境最新国内重大新闻
  • 南京网站运营网站推广优化之八大方法
  • 那家公司做网站比较好百度怎么注册自己的网站
  • 易语言怎么做视频网站天津网站排名提升多少钱
  • wordpress上传图片错误媒体库错误网站推广seo方法
  • 上海软件网站建设软文发稿平台有哪些
  • 重庆比较好的软件开发培训学校广东搜索引擎优化
  • 网站图片优化谷歌seo技巧
  • 网页.网站.主页.网址.域名有什么联系短视频代运营合作方案
  • 网站建设属于哪个行业给你一个网站怎么优化
  • 做名宿比较好的网站咖啡的营销推广软文
  • 北京网站建设咨询公司网站运营优化培训
  • 四模网站电商网站seo优化
  • 亚马逊网站网址淘宝怎么优化关键词步骤
  • 搭建建立网站新app推广去哪里找