当前位置: 首页 > news >正文

MegaTTS3 使用

 1.开发背景

        MegaTTS3 是 TTS 模型中的一种,可以实现文本转语音,需要一定的硬件配置实现加速 。

2.开发需求

        在 Ubuntu 下实现 MegaTTS3 的部署,实现官方语音克隆,根据自定义文本输出语音。

3.开发环境

        Ubuntu20.04 + Conda + MegaTTS3 + RTX5060TI

4.实现步骤

4.1 安装环境

# 创建环境 python 版本建议 3.10 以上
conda create -n py3.12 python=3.12# 进入环境
conda activate py3.12# 退出环境
conda deactivate

4.2 获取源码

4.2.1 修改网络配置

        如果不修改网络配置可能会导致克隆端口访问失败,可选

sudo vi /etc/resolv.conf
#nameserver 127.0.0.53
options edns0 trust-adnameserver 8.8.8.8
nameserver 8.8.4.4
4.2.2 下载源码
# 克隆开源源码
git clone https://github.com/bytedance/MegaTTS3.git

4.3 安装环境

# 进入环境 这个很重要
conda activate py3.12# 进入源码
cd MegaTTS3# 安装相关环境
pip install -r requirements.txt# 安装相关应用程序
sudo apt install ffmpeg
4.3.1 重装 Pytorch 

        使用不同的显卡可能导致显卡驱动和 pytorch 版本不一致报错,如下

(py3.12) yangjinghui@MICROSO-9VFB07B:~/code/ultralytics$ yolo predict model=./yolo11n.pt source=./ultralytics/assets/bus.jpg
/home/yangjinghui/application/anaconda3/envs/yolo/lib/python3.12/site-packages/torch/cuda/__init__.py:287: UserWarning:
NVIDIA GeForce RTX 5060 Ti with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.
If you want to use the NVIDIA GeForce RTX 5060 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/warnings.warn(
Ultralytics 8.3.81 🚀 Python-3.12.11 torch-2.7.1+cu126 CUDA:0 (NVIDIA GeForce RTX 5060 Ti, 16311MiB)
Traceback (most recent call last):

        解决方法,不更换显卡的情况下,重装 pytorch 适配新的显卡,先查看启动版本,12.9

# 查看驱动版本 这里是 CUDA Version: 12.9
(py3.12) yangjinghui@MICROSO-9VFB07B:~/code/ultralytics$ nvidia-smi
Sun Jul 20 11:45:06 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.01              Driver Version: 576.80         CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5060 Ti     On  |   00000000:01:00.0  On |                  N/A |
| 37%   47C    P3             21W /  180W |    2119MiB /  16311MiB |      7%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A              25      G   /Xwayland                             N/A      |
+-----------------------------------------------------------------------------------------+
# 进入 pyorch 网址
https://pytorch.org/

        根据生成的命令安装即可

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129

4.4 运行推理

4.4.1 下载文件
# 安装下载工具
pip install modelscope# 下载权重文件 注意下载路径
mkdir checkpoints/
modelscope download --model ByteDance/MegaTTS3 --local_dir ./checkpoints/
4.4.2 执行推理
(py3.12) yangjinghui@MICROSO-9VFB07B:~/code/py312/MegaTTS3$ cat detect.sh
#!/bin/bashexport PYTHONPATH="/home/yangjinghui/code/py312/MegaTTS3:$PYTHONPATH"
export CUDA_VISIBLE_DEVICES=0python tts/infer_cli.py --input_wav 'assets/Chinese_prompt.wav'  --input_text "大家好,我是蔡徐坤,我会打篮球,大家可以叫我鸡哥,小黑子们,请不要再黑我了,你大爷的'" --output_dir ./gen
(py3.12) yangjinghui@MICROSO-9VFB07B:~/code/py312/MegaTTS3$ ./detect.sh
| loaded 'dur_model' from './checkpoints/duration_lm/model_only_last.ckpt'.
| Missing keys: 0, Unexpected keys: 0
| loaded 'dit' from './checkpoints/diffusion_transformer/model_only_last.ckpt'.
| Missing keys: 0, Unexpected keys: 9
| loaded 'model' from './checkpoints/aligner_lm/model_only_last.ckpt'.
| Missing keys: 0, Unexpected keys: 0
/home/yangjinghui/application/anaconda3/envs/py3.12/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:144: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.WeightNorm.apply(module, name, dim)
| loaded 'model_gen' from './checkpoints/wavvae/decoder.ckpt'.
| Missing keys: 74, Unexpected keys: 0
2025-07-24 19:15:55,028 WETEXT INFO found existing fst: /home/yangjinghui/application/anaconda3/envs/py3.12/lib/python3.12/site-packages/tn/zh_tn_tagger.fst
2025-07-24 19:15:55,028 WETEXT INFO                     /home/yangjinghui/application/anaconda3/envs/py3.12/lib/python3.12/site-packages/tn/zh_tn_verbalizer.fst
2025-07-24 19:15:55,028 WETEXT INFO skip building fst for zh_normalizer ...
2025-07-24 19:15:55,213 WETEXT INFO found existing fst: /home/yangjinghui/application/anaconda3/envs/py3.12/lib/python3.12/site-packages/tn/en_tn_tagger.fst
2025-07-24 19:15:55,213 WETEXT INFO                     /home/yangjinghui/application/anaconda3/envs/py3.12/lib/python3.12/site-packages/tn/en_tn_verbalizer.fst
2025-07-24 19:15:55,213 WETEXT INFO skip building fst for en_normalizer ...
| Start processing assets/Chinese_prompt.wav+大家好,我是蔡徐坤,我会打篮球,大家可以叫我鸡哥,小黑子们,请不要再黑我了,你大爷的'
/home/yangjinghui/code/py312/MegaTTS3/tts/frontend_function.py:49: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with torch.cuda.amp.autocast(dtype=self.precision, enabled=True):
/home/yangjinghui/code/py312/MegaTTS3/tts/frontend_function.py:88: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with torch.cuda.amp.autocast(dtype=self.precision, enabled=True):
/home/yangjinghui/code/py312/MegaTTS3/tts/frontend_function.py:30: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with torch.cuda.amp.autocast(dtype=self.precision, enabled=True):
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:152468 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
/home/yangjinghui/code/py312/MegaTTS3/tts/frontend_function.py:104: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with torch.cuda.amp.autocast(dtype=self.precision, enabled=True):
/home/yangjinghui/code/py312/MegaTTS3/tts/infer_cli.py:232: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with torch.cuda.amp.autocast(dtype=self.precision, enabled=True):
| Saving results to ./gen/[P]大家好,我是蔡徐坤,我会打篮球,大家可以.wav
http://www.dtcms.com/a/296399.html

相关文章:

  • Elasticsearch 的聚合(Aggregations)操作详解
  • Mysql窗口函数
  • 数据库垂直拆分和水平拆分
  • 面经 - 车载多媒体系统
  • 【已解决】YOLO11模型转wts时报错:PytorchStreamReader failed reading zip archive
  • PyTorch数据选取与索引详解:从入门到高效实践
  • es 和 lucene 的区别
  • 【REACT18.x】CRA+TS+ANTD5.X实现useImperativeHandle让父组件修改子组件的数据
  • R study notes[1]
  • linux入门 相关linux系统操作命令(二)--文件管理系统 ubuntu22.04
  • 二分查找-153-寻找旋转排序数组中的最小值-力扣(LeetCode)
  • unordered_map和unordered_set特性以及解决哈希冲突
  • Gemini拿下IMO2025金牌的提示词解析
  • Redis Lua脚本语法详解
  • Redis ①⑦-分布式锁
  • 跨模态理解的基石:非文本内容向量化方法全景解析
  • Lua协同程序(coroutine)
  • leetcode100.相同的树(递归练习题)
  • Xilinx-FPGA-PCIe-XDMA 驱动内核兼容性问题修复方案
  • 基于单片机睡眠质量/睡眠枕头设计
  • 1.1.2 建筑构造要求
  • 无人机正摄影像自动识别与矢量提取系统
  • 用phpEnv安装Thinkphp8.x出错调试全过程记录
  • C++ 中打开文件的多种方式及相关流类
  • matplotlib的详细知识点
  • k8s之ingress定义https访问方式
  • 【AI News | 20250723】每日AI进展
  • Windows11 本地安装docker Desktop 部署dify 拉取镜像报错
  • iOS Core Data 本地数据库 使用详解:从模型关系到数据操作
  • 技嘉z370主板开启vtx