【AI实践】本地部署ASR模型OpenAI Whisper
上一篇部署了ComfyUI,【AI实践】本地部署ComfyUI-CSDN博客
打通了CUDA,torch等环境,部署其它的会方便很多。
直接复制comfyui的环境,免安装一些组件
conda env list
conda create -n whisper --clone rtx50_comfyui pip install PySocks win-inet-pton
conda activate whisper
cd .\workspace\
set HTTP_PROXY=http://127.0.0.1:1080
set HTTPS_PROXY=http://127.0.0.1:1080
git config --global http.proxy http://127.0.0.1:1080
git config --global https.proxy https://127.0.0.1:1080
pip install git cloe kshttps://github.com/openai/whisper.git
conda install -c conda-forge ffmpeg
运行下test.py,内容如下
import whisper
model = whisper.load_model("base").cuda()
print(whisper.__version__)
运行下转录whisper_transcribe.py
import whisper # 加载模型(使用 GPU)
model = whisper.load_model("base").cuda() # 转录示例音频
result = model.transcribe("example.wav", language="zh")
print(result["text"])
python .\whisper_transcribe.py
录音转录
import sounddevice as sd
import numpy as np
import whisperdef record_audio(duration=5, samplerate=16000):print(f"正在录音 {duration} 秒...")audio = sd.rec(int(duration * samplerate), samplerate=samplerate, channels=1, dtype='float32')sd.wait()print("录音结束。")return np.squeeze(audio)def save_wav(filename, audio, samplerate=16000):import scipy.io.wavfilescipy.io.wavfile.write(filename, samplerate, (audio * 32767).astype(np.int16))def transcribe(audio, samplerate=16000):model = whisper.load_model("base")# Whisper 需要文件输入,先保存为 wavtemp_wav = "temp.wav"save_wav(temp_wav, audio, samplerate)result = model.transcribe(temp_wav, language="zh")return result["text"]if __name__ == "__main__":duration = 20 # 录音时长(秒)samplerate = 16000audio = record_audio(duration, samplerate)text = transcribe(audio, samplerate)print("识别结果:", text)