MegaTTS3 使用
1.开发背景
MegaTTS3 是 TTS 模型中的一种,可以实现文本转语音,需要一定的硬件配置实现加速 。
2.开发需求
在 Ubuntu 下实现 MegaTTS3 的部署,实现官方语音克隆,根据自定义文本输出语音。
3.开发环境
Ubuntu20.04 + Conda + MegaTTS3 + RTX5060TI
4.实现步骤
4.1 安装环境
# 创建环境 python 版本建议 3.10 以上
conda create -n py3.12 python=3.12# 进入环境
conda activate py3.12# 退出环境
conda deactivate
4.2 获取源码
4.2.1 修改网络配置
如果不修改网络配置可能会导致克隆端口访问失败,可选
sudo vi /etc/resolv.conf
#nameserver 127.0.0.53
options edns0 trust-adnameserver 8.8.8.8
nameserver 8.8.4.4
4.2.2 下载源码
# 克隆开源源码
git clone https://github.com/bytedance/MegaTTS3.git
4.3 安装环境
# 进入环境 这个很重要
conda activate py3.12# 进入源码
cd MegaTTS3# 安装相关环境
pip install -r requirements.txt# 安装相关应用程序
sudo apt install ffmpeg
4.3.1 重装 Pytorch
使用不同的显卡可能导致显卡驱动和 pytorch 版本不一致报错,如下
(py3.12) yangjinghui@MICROSO-9VFB07B:~/code/ultralytics$ yolo predict model=./yolo11n.pt source=./ultralytics/assets/bus.jpg
/home/yangjinghui/application/anaconda3/envs/yolo/lib/python3.12/site-packages/torch/cuda/__init__.py:287: UserWarning:
NVIDIA GeForce RTX 5060 Ti with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.
If you want to use the NVIDIA GeForce RTX 5060 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/warnings.warn(
Ultralytics 8.3.81 🚀 Python-3.12.11 torch-2.7.1+cu126 CUDA:0 (NVIDIA GeForce RTX 5060 Ti, 16311MiB)
Traceback (most recent call last):
解决方法,不更换显卡的情况下,重装 pytorch 适配新的显卡,先查看启动版本,12.9
# 查看驱动版本 这里是 CUDA Version: 12.9
(py3.12) yangjinghui@MICROSO-9VFB07B:~/code/ultralytics$ nvidia-smi
Sun Jul 20 11:45:06 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.01 Driver Version: 576.80 CUDA Version: 12.9 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 5060 Ti On | 00000000:01:00.0 On | N/A |
| 37% 47C P3 21W / 180W | 2119MiB / 16311MiB | 7% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| 0 N/A N/A 25 G /Xwayland N/A |
+-----------------------------------------------------------------------------------------+
# 进入 pyorch 网址
https://pytorch.org/
根据生成的命令安装即可
pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129
4.4 运行推理
4.4.1 下载文件
# 安装下载工具
pip install modelscope# 下载权重文件 注意下载路径
mkdir checkpoints/
modelscope download --model ByteDance/MegaTTS3 --local_dir ./checkpoints/
4.4.2 执行推理
(py3.12) yangjinghui@MICROSO-9VFB07B:~/code/py312/MegaTTS3$ cat detect.sh
#!/bin/bashexport PYTHONPATH="/home/yangjinghui/code/py312/MegaTTS3:$PYTHONPATH"
export CUDA_VISIBLE_DEVICES=0python tts/infer_cli.py --input_wav 'assets/Chinese_prompt.wav' --input_text "大家好,我是蔡徐坤,我会打篮球,大家可以叫我鸡哥,小黑子们,请不要再黑我了,你大爷的'" --output_dir ./gen
(py3.12) yangjinghui@MICROSO-9VFB07B:~/code/py312/MegaTTS3$ ./detect.sh
| loaded 'dur_model' from './checkpoints/duration_lm/model_only_last.ckpt'.
| Missing keys: 0, Unexpected keys: 0
| loaded 'dit' from './checkpoints/diffusion_transformer/model_only_last.ckpt'.
| Missing keys: 0, Unexpected keys: 9
| loaded 'model' from './checkpoints/aligner_lm/model_only_last.ckpt'.
| Missing keys: 0, Unexpected keys: 0
/home/yangjinghui/application/anaconda3/envs/py3.12/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:144: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.WeightNorm.apply(module, name, dim)
| loaded 'model_gen' from './checkpoints/wavvae/decoder.ckpt'.
| Missing keys: 74, Unexpected keys: 0
2025-07-24 19:15:55,028 WETEXT INFO found existing fst: /home/yangjinghui/application/anaconda3/envs/py3.12/lib/python3.12/site-packages/tn/zh_tn_tagger.fst
2025-07-24 19:15:55,028 WETEXT INFO /home/yangjinghui/application/anaconda3/envs/py3.12/lib/python3.12/site-packages/tn/zh_tn_verbalizer.fst
2025-07-24 19:15:55,028 WETEXT INFO skip building fst for zh_normalizer ...
2025-07-24 19:15:55,213 WETEXT INFO found existing fst: /home/yangjinghui/application/anaconda3/envs/py3.12/lib/python3.12/site-packages/tn/en_tn_tagger.fst
2025-07-24 19:15:55,213 WETEXT INFO /home/yangjinghui/application/anaconda3/envs/py3.12/lib/python3.12/site-packages/tn/en_tn_verbalizer.fst
2025-07-24 19:15:55,213 WETEXT INFO skip building fst for en_normalizer ...
| Start processing assets/Chinese_prompt.wav+大家好,我是蔡徐坤,我会打篮球,大家可以叫我鸡哥,小黑子们,请不要再黑我了,你大爷的'
/home/yangjinghui/code/py312/MegaTTS3/tts/frontend_function.py:49: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with torch.cuda.amp.autocast(dtype=self.precision, enabled=True):
/home/yangjinghui/code/py312/MegaTTS3/tts/frontend_function.py:88: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with torch.cuda.amp.autocast(dtype=self.precision, enabled=True):
/home/yangjinghui/code/py312/MegaTTS3/tts/frontend_function.py:30: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with torch.cuda.amp.autocast(dtype=self.precision, enabled=True):
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:152468 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
/home/yangjinghui/code/py312/MegaTTS3/tts/frontend_function.py:104: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with torch.cuda.amp.autocast(dtype=self.precision, enabled=True):
/home/yangjinghui/code/py312/MegaTTS3/tts/infer_cli.py:232: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with torch.cuda.amp.autocast(dtype=self.precision, enabled=True):
| Saving results to ./gen/[P]大家好,我是蔡徐坤,我会打篮球,大家可以.wav