当前位置：首页 > news >正文

【源力觉醒创作者计划】文心开源大模型ERNIE-4.5私有化部署保姆级教程与多功能界面窗口部署

news 2025/7/14 6:04:57

按照我这个路线来部署，网速快五分钟就能零基础跑通模型
一起来轻松玩转文心大模型吧👉一文心大模型免费下载地址: https://ai.gitcode.com/theme/1939325484087291906

计算机配置

组件	配置
GPU	NVIDIA A8000 SXM4 80GB × 1
CPU	15 核处理器
内存	249GB 内存
硬盘	系统盘 100GB + 数据盘 50GB

部署使用的电脑都是只有系统的云电脑，部署过程中的性能差异，评估它们的运行效率和资源消耗，从而为不同需求的开发者和研究者提供参考依据。
文心模型汇总

环境配置与部署

1. 更换镜像源（使用阿里云镜像源）：

sudo cp /etc/apt/sources.list /etc/apt/sources.list.bak
sudo sed -i 's|http://archive.ubuntu.com/ubuntu|http://mirrors.aliyun.com/ubuntu|g' /etc/apt/sources.list
sudo sed -i 's|http://security.ubuntu.com/ubuntu|http://mirrors.aliyun.com/ubuntu|g' /etc/apt/sources.list
sudo apt update

2. 切换当前工作目录：

cd /
pwd

3. 安装虚拟环境工具：

sudo apt update
sudo apt install -y python3-venv

4. 创建虚拟环境：

python3 -m venv --without-pip /fastdeploy-env
source /fastdeploy-env/bin/activate

使用虚拟环境能是的python依赖保持干净独立

5. 安装 `pip`：

curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py
python get-pip.py

6. 安装 PaddlePaddle GPU 版本：

python -m pip install paddlepaddle-gpu==3.1.0 -i https://www.paddlepaddle.org.cn/packages/stable/cu126/

7. 安装 FastDeploy GPU 稳定版本：

python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/stable/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

8. 安装 FastDeploy GPU 最新开发构建版本：

python -m pip install fastdeploy-gpu -i https://www.paddlepaddle.org.cn/packages/nightly/fastdeploy-gpu-80_90/ --extra-index-url https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple

六到八步有一个要点，得更具GPU选对版本。具体参考
到这一步都没问题那你其实就是基本成功了，因为接下来只需要用FastDeploy 来跑AI就好，基本不会有什么问题。

9.ERNIE-4.5-21B-A3B-Base-Paddle

python -m fastdeploy.entrypoints.openai.api_server \--model baidu/ERNIE-4.5-21B-A3B-Base-Paddle \--port 8180 \--metrics-port 8181 \--engine-worker-queue-port 8182 \--max-model-len 32768 \--max-num-seqs 32 &

这个时候就能够问模型了，此时模型的沟通端口是暴露在本地的，访问127.0.0.1:8181 即可

curl http://127.0.0.1:8181/v1/chat/completions -H "Content-Type: application/json" -d '{"model": "baidu/ERNIE-4.5-0.3B-Base-Paddle", "messages": [{"role": "user", "content": "你好，文心一言"}]}'

将上面代码直接复制粘贴，就能与模型进行对话了。怎么样，是不是非常简单呢？

多功能界面窗口

效果

这个界面集成了几个基本的功能，如温度值调控、最大token数量。还可以支持多个文心切换，相当实用

代码

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
"""
ERNIE-4.5 命令行聊天界面
支持与本地部署的ERNIE-4.5模型进行交互
"""import json
import requests
import sys
import os
import time
from datetime import datetime
import argparse
import signalclass ERNIEChatCLI:def __init__(self, base_url="http://localhost:8180", model_name="baidu/ERNIE-4.5-21B-A3B-Base-Paddle"):self.base_url = base_urlself.model_name = model_nameself.session = requests.Session()self.conversation_history = []self.system_prompt = "你是一个有用的AI助手。"def check_server_status(self):"""检查服务器状态"""try:response = self.session.get(f"{self.base_url}/health", timeout=5)return response.status_code == 200except:return Falsedef get_models(self):"""获取可用模型列表"""try:response = self.session.get(f"{self.base_url}/v1/models", timeout=10)if response.status_code == 200:return response.json()return Noneexcept:return Nonedef chat_completion(self, messages, temperature=0.7, max_tokens=2048, stream=False):"""发送聊天请求"""payload = {"model": self.model_name,"messages": messages,"temperature": temperature,"max_tokens": max_tokens,"stream": stream}try:response = self.session.post(f"{self.base_url}/v1/chat/completions",json=payload,timeout=60,stream=stream)if stream:return responseelse:if response.status_code == 200:return response.json()else:return {"error": f"HTTP {response.status_code}: {response.text}"}except Exception as e:return {"error": str(e)}def stream_response(self, response):"""处理流式响应"""content = ""try:for line in response.iter_lines():if line:line = line.decode('utf-8')if line.startswith('data: '):data = line[6:]if data.strip() == '[DONE]':breaktry:json_data = json.loads(data)if 'choices' in json_data and len(json_data['choices']) > 0:delta = json_data['choices'][0].get('delta', {})if 'content' in delta:chunk = delta['content']content += chunkprint(chunk, end='', flush=True)except json.JSONDecodeError:continueexcept KeyboardInterrupt:print("\n[中断]")return contentdef format_message(self, role, content):"""格式化消息"""timestamp = datetime.now().strftime("%H:%M:%S")if role == "user":return f"\033[36m[{timestamp}] 你: \033[0m{content}"else:return f"\033[32m[{timestamp}] AI: \033[0m{content}"def show_help(self):"""显示帮助信息"""help_text = """
\033[1m可用命令:\033[0m/help       - 显示帮助信息/clear      - 清除对话历史/history    - 显示对话历史/system     - 设置系统提示词/models     - 显示可用模型/status     - 检查服务器状态/temp <n>   - 设置温度参数 (0.0-2.0)/tokens <n> - 设置最大token数/stream     - 切换流式输出模式/save       - 保存对话到文件/load       - 从文件加载对话/exit       - 退出程序\033[1m快捷键:\033[0mCtrl+C      - 中断当前响应Ctrl+D      - 退出程序"""print(help_text)def save_conversation(self, filename=None):"""保存对话到文件"""if not filename:filename = f"conversation_{datetime.now().strftime('%Y%m%d_%H%M%S')}.json"try:with open(filename, 'w', encoding='utf-8') as f:json.dump({"system_prompt": self.system_prompt,"conversation": self.conversation_history}, f, ensure_ascii=False, indent=2)print(f"对话已保存到: {filename}")except Exception as e:print(f"保存失败: {e}")def load_conversation(self, filename):"""从文件加载对话"""try:with open(filename, 'r', encoding='utf-8') as f:data = json.load(f)self.system_prompt = data.get("system_prompt", self.system_prompt)self.conversation_history = data.get("conversation", [])print(f"对话已从 {filename} 加载")except Exception as e:print(f"加载失败: {e}")def run(self):"""运行聊天界面"""print("\033[1m=== ERNIE-4.5 命令行聊天界面 ===\033[0m")print(f"模型: {self.model_name}")print(f"服务器: {self.base_url}")# 检查服务器状态if not self.check_server_status():print(f"\033[31m错误: 无法连接到服务器 {self.base_url}\033[0m")print("请确保服务器正在运行并且端口正确")returnprint("\033[32m服务器连接正常\033[0m")print("输入 /help 查看帮助信息，输入 /exit 退出")print("-" * 50)# 配置参数temperature = 0.7max_tokens = 2048stream_mode = Truewhile True:try:user_input = input("\033[36m> \033[0m").strip()if not user_input:continue# 处理命令if user_input.startswith('/'):cmd_parts = user_input.split()cmd = cmd_parts[0].lower()if cmd == '/help':self.show_help()elif cmd == '/exit':print("再见！")breakelif cmd == '/clear':self.conversation_history.clear()print("对话历史已清除")elif cmd == '/history':if not self.conversation_history:print("暂无对话历史")else:for msg in self.conversation_history:print(self.format_message(msg['role'], msg['content']))elif cmd == '/system':if len(cmd_parts) > 1:self.system_prompt = ' '.join(cmd_parts[1:])print(f"系统提示词已设置为: {self.system_prompt}")else:print(f"当前系统提示词: {self.system_prompt}")elif cmd == '/models':models = self.get_models()if models:print("可用模型:")for model in models.get('data', []):print(f"  - {model.get('id', 'N/A')}")else:print("无法获取模型列表")elif cmd == '/status':if self.check_server_status():print("\033[32m服务器状态: 正常\033[0m")else:print("\033[31m服务器状态: 异常\033[0m")elif cmd == '/temp':if len(cmd_parts) > 1:try:temperature = float(cmd_parts[1])temperature = max(0.0, min(2.0, temperature))print(f"温度参数设置为: {temperature}")except ValueError:print("无效的温度值")else:print(f"当前温度: {temperature}")elif cmd == '/tokens':if len(cmd_parts) > 1:try:max_tokens = int(cmd_parts[1])max_tokens = max(1, min(32768, max_tokens))print(f"最大token数设置为: {max_tokens}")except ValueError:print("无效的token数")else:print(f"当前最大token数: {max_tokens}")elif cmd == '/stream':stream_mode = not stream_modeprint(f"流式输出模式: {'开启' if stream_mode else '关闭'}")elif cmd == '/save':filename = cmd_parts[1] if len(cmd_parts) > 1 else Noneself.save_conversation(filename)elif cmd == '/load':if len(cmd_parts) > 1:self.load_conversation(cmd_parts[1])else:print("请指定文件名")else:print(f"未知命令: {cmd}")continue# 构建消息messages = [{"role": "system", "content": self.system_prompt}]messages.extend(self.conversation_history)messages.append({"role": "user", "content": user_input})# 显示用户消息print(self.format_message("user", user_input))# 发送请求print(f"\033[32m[{datetime.now().strftime('%H:%M:%S')}] AI: \033[0m", end='', flush=True)if stream_mode:response = self.chat_completion(messages, temperature, max_tokens, stream=True)if hasattr(response, 'iter_lines'):ai_response = self.stream_response(response)print()  # 换行else:ai_response = "连接错误"print(ai_response)else:result = self.chat_completion(messages, temperature, max_tokens, stream=False)if 'error' in result:ai_response = f"错误: {result['error']}"else:ai_response = result['choices'][0]['message']['content']print(ai_response)# 保存到历史记录self.conversation_history.append({"role": "user", "content": user_input})self.conversation_history.append({"role": "assistant", "content": ai_response})# 限制历史记录长度if len(self.conversation_history) > 20:self.conversation_history = self.conversation_history[-20:]except KeyboardInterrupt:print("\n使用 /exit 退出程序")continueexcept EOFError:print("\n再见！")breakexcept Exception as e:print(f"\n错误: {e}")continuedef main():parser = argparse.ArgumentParser(description='ERNIE-4.5 命令行聊天界面')parser.add_argument('--url', default='http://localhost:8180', help='服务器URL')parser.add_argument('--model', default='baidu/ERNIE-4.5-21B-A3B-Base-Paddle', help='模型名称')args = parser.parse_args()# 处理中断信号def signal_handler(sig, frame):print('\n正在退出...')sys.exit(0)signal.signal(signal.SIGINT, signal_handler)# 创建并运行聊天界面cli = ERNIEChatCLI(args.url, args.model)cli.run()if __name__ == "__main__":main()

部署流程

将文件放好

运行

python3 ernie_chat.py --url http://localhost:8180 --model baidu/ERNIE-4.5-21B-A3B-Base-Paddle

结束语

你好,我是Qiuner. 为帮助别人少走弯路而写博客 这是我的 github https://github.com/Qiuner⭐ gitee https://gitee.com/Qiuner 🌹

如果本篇文章帮到了你不妨点个赞吧~ 我会很高兴的 😄 (^ ~ ^) 。想看更多那就点个关注吧我会尽力带来有趣的内容 😎。

代码都在github或gitee上，如有需要可以去上面自行下载。记得给我点星星哦😍

如果你遇到了问题，自己没法解决，可以去我掘金评论区问。私信看不完，CSDN评论区可能会漏看掘金账号 https://juejin.cn/user/1942157160101860 掘金账号