当前位置：首页 > news >正文

MegaTTS3 使用

news 2025/7/25 9:16:18

1.开发背景

MegaTTS3 是 TTS 模型中的一种，可以实现文本转语音，需要一定的硬件配置实现加速。

2.开发需求

在 Ubuntu 下实现 MegaTTS3 的部署，实现官方语音克隆，根据自定义文本输出语音。

3.开发环境

Ubuntu20.04 + Conda + MegaTTS3 + RTX5060TI

4.实现步骤

4.1 安装环境

# 创建环境 python 版本建议 3.10 以上
conda create -n py3.12 python=3.12# 进入环境
conda activate py3.12# 退出环境
conda deactivate

4.2 获取源码

4.2.1 修改网络配置

如果不修改网络配置可能会导致克隆端口访问失败，可选

sudo vi /etc/resolv.conf

#nameserver 127.0.0.53
options edns0 trust-adnameserver 8.8.8.8
nameserver 8.8.4.4

4.2.2 下载源码

# 克隆开源源码
git clone https://github.com/bytedance/MegaTTS3.git

4.3 安装环境

# 进入环境 这个很重要
conda activate py3.12# 进入源码
cd MegaTTS3# 安装相关环境
pip install -r requirements.txt# 安装相关应用程序
sudo apt install ffmpeg

4.3.1 重装 Pytorch

使用不同的显卡可能导致显卡驱动和 pytorch 版本不一致报错，如下

(py3.12) yangjinghui@MICROSO-9VFB07B:~/code/ultralytics$ yolo predict model=./yolo11n.pt source=./ultralytics/assets/bus.jpg
/home/yangjinghui/application/anaconda3/envs/yolo/lib/python3.12/site-packages/torch/cuda/__init__.py:287: UserWarning:
NVIDIA GeForce RTX 5060 Ti with CUDA capability sm_120 is not compatible with the current PyTorch installation.
The current PyTorch install supports CUDA capabilities sm_50 sm_60 sm_70 sm_75 sm_80 sm_86 sm_90.
If you want to use the NVIDIA GeForce RTX 5060 Ti GPU with PyTorch, please check the instructions at https://pytorch.org/get-started/locally/warnings.warn(
Ultralytics 8.3.81 🚀 Python-3.12.11 torch-2.7.1+cu126 CUDA:0 (NVIDIA GeForce RTX 5060 Ti, 16311MiB)
Traceback (most recent call last):

解决方法，不更换显卡的情况下，重装 pytorch 适配新的显卡，先查看启动版本，12.9

# 查看驱动版本 这里是 CUDA Version: 12.9
(py3.12) yangjinghui@MICROSO-9VFB07B:~/code/ultralytics$ nvidia-smi
Sun Jul 20 11:45:06 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.64.01              Driver Version: 576.80         CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5060 Ti     On  |   00000000:01:00.0  On |                  N/A |
| 37%   47C    P3             21W /  180W |    2119MiB /  16311MiB |      7%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------++-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|    0   N/A  N/A              25      G   /Xwayland                             N/A      |
+-----------------------------------------------------------------------------------------+

# 进入 pyorch 网址
https://pytorch.org/

根据生成的命令安装即可

pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129

4.4 运行推理

4.4.1 下载文件

# 安装下载工具
pip install modelscope# 下载权重文件 注意下载路径
mkdir checkpoints/
modelscope download --model ByteDance/MegaTTS3 --local_dir ./checkpoints/

4.4.2 执行推理

(py3.12) yangjinghui@MICROSO-9VFB07B:~/code/py312/MegaTTS3$ cat detect.sh
#!/bin/bashexport PYTHONPATH="/home/yangjinghui/code/py312/MegaTTS3:$PYTHONPATH"
export CUDA_VISIBLE_DEVICES=0python tts/infer_cli.py --input_wav 'assets/Chinese_prompt.wav'  --input_text "大家好，我是蔡徐坤，我会打篮球，大家可以叫我鸡哥，小黑子们，请不要再黑我了，你大爷的'" --output_dir ./gen

(py3.12) yangjinghui@MICROSO-9VFB07B:~/code/py312/MegaTTS3$ ./detect.sh
| loaded 'dur_model' from './checkpoints/duration_lm/model_only_last.ckpt'.
| Missing keys: 0, Unexpected keys: 0
| loaded 'dit' from './checkpoints/diffusion_transformer/model_only_last.ckpt'.
| Missing keys: 0, Unexpected keys: 9
| loaded 'model' from './checkpoints/aligner_lm/model_only_last.ckpt'.
| Missing keys: 0, Unexpected keys: 0
/home/yangjinghui/application/anaconda3/envs/py3.12/lib/python3.12/site-packages/torch/nn/utils/weight_norm.py:144: FutureWarning: `torch.nn.utils.weight_norm` is deprecated in favor of `torch.nn.utils.parametrizations.weight_norm`.WeightNorm.apply(module, name, dim)
| loaded 'model_gen' from './checkpoints/wavvae/decoder.ckpt'.
| Missing keys: 74, Unexpected keys: 0
2025-07-24 19:15:55,028 WETEXT INFO found existing fst: /home/yangjinghui/application/anaconda3/envs/py3.12/lib/python3.12/site-packages/tn/zh_tn_tagger.fst
2025-07-24 19:15:55,028 WETEXT INFO                     /home/yangjinghui/application/anaconda3/envs/py3.12/lib/python3.12/site-packages/tn/zh_tn_verbalizer.fst
2025-07-24 19:15:55,028 WETEXT INFO skip building fst for zh_normalizer ...
2025-07-24 19:15:55,213 WETEXT INFO found existing fst: /home/yangjinghui/application/anaconda3/envs/py3.12/lib/python3.12/site-packages/tn/en_tn_tagger.fst
2025-07-24 19:15:55,213 WETEXT INFO                     /home/yangjinghui/application/anaconda3/envs/py3.12/lib/python3.12/site-packages/tn/en_tn_verbalizer.fst
2025-07-24 19:15:55,213 WETEXT INFO skip building fst for en_normalizer ...
| Start processing assets/Chinese_prompt.wav+大家好，我是蔡徐坤，我会打篮球，大家可以叫我鸡哥，小黑子们，请不要再黑我了，你大爷的'
/home/yangjinghui/code/py312/MegaTTS3/tts/frontend_function.py:49: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with torch.cuda.amp.autocast(dtype=self.precision, enabled=True):
/home/yangjinghui/code/py312/MegaTTS3/tts/frontend_function.py:88: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with torch.cuda.amp.autocast(dtype=self.precision, enabled=True):
/home/yangjinghui/code/py312/MegaTTS3/tts/frontend_function.py:30: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with torch.cuda.amp.autocast(dtype=self.precision, enabled=True):
The attention mask and the pad token id were not set. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
Setting `pad_token_id` to `eos_token_id`:152468 for open-end generation.
The attention mask is not set and cannot be inferred from input because pad token is same as eos token. As a consequence, you may observe unexpected behavior. Please pass your input's `attention_mask` to obtain reliable results.
/home/yangjinghui/code/py312/MegaTTS3/tts/frontend_function.py:104: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with torch.cuda.amp.autocast(dtype=self.precision, enabled=True):
/home/yangjinghui/code/py312/MegaTTS3/tts/infer_cli.py:232: FutureWarning: `torch.cuda.amp.autocast(args...)` is deprecated. Please use `torch.amp.autocast('cuda', args...)` instead.with torch.cuda.amp.autocast(dtype=self.precision, enabled=True):
| Saving results to ./gen/[P]大家好，我是蔡徐坤，我会打篮球，大家可以.wav

查看全文

http://www.dtcms.com/a/296399.html