基于 Vllm 在linux 私有化部署DeepSeek-R1以及使用RESTful API的方式使用模型
##通用写法,忽略就行,与deepseek无关
import pandas as pd
from openai.embeddings_utils import get_embedding, cosine_similarity
import openai
import os
import logging as logger
from flask_cors import CORS
import os
openai.api_key = os.getenv('OPENAI_API_KEY')
class Chatbot():
def parse_paper(self, pdf):
logger.info("Parsing paper")
number_of_pages = len(pdf.pages)
logger.info(f"Total number of pages: {number_of_pages}")
paper_text = []
for i in range(number_of_pages):
page = pdf.pages[i]
page_text = []
def visitor_body(text, cm, tm, fontDict, fontSize):
x = tm[4]
y = tm[5]
# ignore header/footer
if (y > 50 and y < 720) and (len(text.strip()) > 1):
page_text.append({
'fontsize': fontSize,
'text': text.strip().replace('\x03', ''),
'x': x,
'y': y
})
_ = page.extract_text(visitor_text=visitor_body)
blob_font_size = None
blob_text = ''
processed_text = []
for t in page_text:
if t['fontsize'] == blob_font_size:
blob_text += f" {t['text']}"
if len(blob_text) >= 2000:
processed_text.append({
'fontsize': blob_font_size,
'text': blob_text,
'page': i
})
blob_font_size = None
blob_text = ''
else:
if blob_font_size is not None and len(blob_text) >= 1:
processed_text.append({
'fontsize': blob_font_size,
'text': blob_text,
'page': i
})
blob_font_size = t['fontsize']
blob_text = t['text']
paper_text += processed_text
logger.info("Done parsing paper")
return paper_text
1.传送门
vllm文档地址
1.英文版 Quickstart — vLLM
2.中文版 引擎参数 | vLLM 中文站
开源模型下载地址
https://huggingface.co/deepseek-aihttps://huggingface.co/modelshttps://huggingface.co/deepseek-ai
2.硬件配置(我的linux系统)
Linux Contos 内存 440G , CPU 180核, L20显卡 4块
3.部署步骤
1.pyhton环境配置
1.安装python
sudo dnf install -y gcc openssl-devel bzip2-devel libffi-devel xz-devel sqlite-devel
wget https://www.python.org/ftp/python/3.11.3/Python-3.11.3.tgz
tar xvf Python-3.11.3.tgz
cd Python-3.11.3
./configure --enable-optimizations
make
sudo make altinstall
2.查看python版本
/usr/local/bin/python3.11 --version
3.创建虚拟环境(推荐)
/usr/local/bin/python3.11 -m venv vllm_env
4.启动虚拟环境
source vllm_env/bin/activate
5.关闭虚拟环境命令
deactivate
2.vllm安装
1.启动虚拟环境
source vllm_env/bin/activate
2.安装vllm
pip3 install vllm
3.下载开源模型
1.安装huggingface_hub
pip3 install huggingface_hub
pip install "huggingface_hub[hf_transfer]"
2.下载模型
HF_HUB_ENABLE_HF_TRANSFER=1 \
huggingface-cli download deepseek-ai/DeepSeek-R1-Distill-Qwen-32B \
--local-dir /path/to/save/model \
--local-dir-use-symlinks False
参数说明:
--local-dir /path/to/save/model:指定模型下载的目标目录。
--local-dir-use-symlinks False:禁用符号链接,直接下载文件到目标目录。
HF_HUB_ENABLE_HF_TRANSFER=1:启用高效下载传输(可选)。
4.启动运行模型,以及相关指标查看命令
#查看显卡
lspci | grep -i nvidia
lspci | grep -E "NVIDIA|VGA"
#查看驱动
lshw -numeric -C display
#英伟达显卡
nvidia-smi
#实时监控
watch -n 1 nvidia-smi
模型运行后可查看 显卡 显存 cpu等情况
=======================================
1.运行本地模型
CUDA_VISIBLE_DEVICES=0,1,2,3 vllm serve /local/model/DeepSeek-R1-Distill-Qwen-14B --dtype float16 --tensor-parallel-size 4 --gpu-memory-utilization 0.95 --max-model-len 4096 --trust-remote-code --enforce-eager --port 8102 --served-model-name deepseek-r1-32b --enable-reasoning --reasoning-parser deepseek_r1
==参数详情可看 vllm文档有介绍
执行 vllm serve --help 参数详情
5.验证服务api接口
curl http://localhost:8102/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "deepseek-r1-32b",
"messages": [{"role": "user", "content": "你好"}]
}'