当前位置: 首页 > news >正文

【LLM】用 FastAPI 搭建「OpenAI 兼容」DeepSeek-OCR 服务 + 简洁WebUI

用 FastAPI 搭建「OpenAI 兼容」DeepSeek-OCR 服务 + 简洁WebUI

  • 用 FastAPI 搭建「OpenAI 兼容」DeepSeek-OCR 服务 + 简洁WebUI
    • 1. 功能概览
    • 2. 环境准备
    • 3. 目录结构建议
    • 4. 后端启动
    • 5. 请求/响应要点(/v1/chat/completions)
    • 6. 前端交互逻辑
    • 7. 三个源码占位
      • (1)后端服务源码:`app.py`
      • (2)Python 客户端调用示例(OpenAI SDK)
      • (3)前端单页:`static/ui.html`

用 FastAPI 搭建「OpenAI 兼容」DeepSeek-OCR 服务 + 简洁WebUI

目标:本地部署 DeepSeek-OCR,暴露 /v1/chat/completions(OpenAI 协议兼容),并提供一个静态网页上传图片+输入提示直接获取结果。

在这里插入图片描述

1. 功能概览

  • 后端:FastAPI,兼容 GET /v1/modelsPOST /v1/chat/completionsGET /healthPOST /parserToText
  • 模型:DeepSeek-OCR(transformers,trust_remote_code=True)。
  • 图片输入:支持 data: Base64(推荐)、本地路径/file://http(s)
  • 前端:单文件 static/ui.html,可选预设(Markdown / 纯文本 / JSON 结构),带 Markdown 预览。
  • 静态目录/static,且提供 /ui 跳转到 static/ui.html

2. 环境准备

  • Python 3.12+
  • 建议用 Conda/venv
  • 依赖(示例):fastapiuvicorn[standard]transformerstorchrequests
  • 可选:flash-attn(提速、省显存)
conda create -n deepseekocr python=3.12.9
conda activate deepseekocr
pip install torch==2.6.0 transformers==4.46.3 tokenizers==0.20.3 einops addict easydict python-multipart uvicorn fastapi Pillow torchvision

3. 目录结构建议

project/
├─ app.py                 # 后端服务(OpenAI 兼容 + 图片三种输入)
├─ static/
│  └─ ui.html             # 单文件前端(上传图片→data:URI→/v1/chat/completions)
└─ README.md

4. 后端启动

python app.py
  • 监听:http://0.0.0.0:8001
  • 健康:/health
  • 模型:/v1/models(固定返回 deepseek-ocr
  • 推理:/v1/chat/completions
  • 表单:/parserToText
  • 前端:/ui(跳转到 /static/ui.html

5. 请求/响应要点(/v1/chat/completions)

  • 请求(核心)

    • model: "deepseek-ocr"
    • messages: [{ role: "user", content: [ {type:"text", text:"你的提示"}, {type:"image_url", image_url:{url:"<图片地址>"}} ] }]
    • <图片地址>data: Base64(推荐) / 本地 file:///... 或绝对路径 / http(s) URL
  • 响应

    • choices[0].message.content:模型的文本结果(可为 Markdown)

6. 前端交互逻辑

  • 选择图片 → 读成 data: Base64(FileReader.readAsDataURL
  • 组装 messages(预设 + 自定义提示 + image_url
  • POST /v1/chat/completions → 渲染原始文本 & Markdown 预览

7. 三个源码占位

(1)后端服务源码:app.py

# python 3.12+
# pip install fastapi uvicorn transformers torch requests
import os
import time
import uuid
import base64
import tempfile
import mimetypes
import logging
from typing import Any, Dict, List, Optional, Tuple
from urllib.parse import urlparseimport requests
import torch
from fastapi import FastAPI, File, UploadFile, Form, Request, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse, HTMLResponse
from fastapi.staticfiles import StaticFiles
from transformers import AutoModel, AutoTokenizer# ---------------- logging ----------------
logging.basicConfig(level=logging.INFO)
log = logging.getLogger("ocr-api")# ---------------- app & CORS -------------
app = FastAPI(title="Transformers模型服务 (OpenAI-Compatible)")
app.add_middleware(CORSMiddleware,allow_origins=["*"], allow_credentials=True,allow_methods=["*"], allow_headers=["*"],
)# 静态目录(用于放置你的 ui.html)
STATIC_DIR = os.getenv("STATIC_DIR", "static")
os.makedirs(STATIC_DIR, exist_ok=True)
app.mount("/static", StaticFiles(directory=STATIC_DIR), name="static")# 便捷入口:/ui -> /static/ui.html(可选)
@app.get("/ui")
async def ui_redirect():html = '<meta http-equiv="refresh" content="0; url=/static/ui.html" />'return HTMLResponse(content=html, status_code=200)# ---------------- model load -------------
os.environ.setdefault("CUDA_VISIBLE_DEVICES", "0")
MODEL_NAME = os.getenv("DEEPSEEK_OCR_PATH", "/home/qwt/models/DeepSeek-OCR")  # 或 "deepseek-ai/DeepSeek-OCR"
OPENAI_MODEL_ID = "deepseek-ocr"tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
model = AutoModel.from_pretrained(MODEL_NAME,trust_remote_code=True,use_safetensors=True,# _attn_implementation="flash_attention_2",  # 安装了 flash-attn 再打开
)# 优雅设置精度与设备:优先 bfloat16,不支持则用 float16;最后兜底 float32(CPU)
if torch.cuda.is_available():device = torch.device("cuda:0")model = model.eval().to(device)try:model = model.to(torch.bfloat16)except Exception:try:model = model.to(torch.float16)log.info("BF16 不可用,已回退到 FP16")except Exception:model = model.to(torch.float32)log.info("FP16 不可用,已回退到 FP32")
else:device = torch.device("cpu")model = model.eval().to(device)log.warning("未检测到 CUDA,将在 CPU 上推理。")# ---------------- helpers ----------------
def _now_ts() -> int:return int(time.time())def _gen_id(prefix: str) -> str:return f"{prefix}_{uuid.uuid4().hex[:24]}"def _save_bytes_to_temp(data: bytes, suffix: str = "") -> str:tmp = tempfile.NamedTemporaryFile(delete=False, suffix=suffix)tmp.write(data)tmp.flush()tmp.close()return tmp.namedef _is_data_uri(url: str) -> bool:return isinstance(url, str) and url.startswith("data:")def _is_local_like(s: str) -> bool:"""只要不是 http/https/data,就按本地处理:- 绝对/相对路径、~ 扩展、file:// 前缀、Windows 盘符等"""if not isinstance(s, str):return Falseif s.startswith("file://"):return Trueparsed = urlparse(s)if parsed.scheme in ("http", "https", "data"):return Falsereturn Truedef _to_local_path(s: str) -> str:if s.startswith("file://"):return s[7:]return os.path.expanduser(s)def _download_to_temp(url: str) -> str:"""支持三类输入:1) ) 本地文件路径(相对/绝对)或 file:// 前缀(不再依赖 exists 判定是否本地)3) http(s) URL返回:下载/复制后的临时文件路径"""if not isinstance(url, str) or not url.strip():raise HTTPException(status_code=400, detail="Empty image url")# 1) data: URIif _is_data_uri(url):try:header, b64 = url.split(",", 1)ext = ".bin"if "image/png" in header: ext = ".png"elif "image/jpeg" in header or "image/jpg" in header: ext = ".jpg"elif "image/webp" in header: ext = ".webp"raw = base64.b64decode(b64)path = _save_bytes_to_temp(raw, suffix=ext)log.info(f"[image] data-uri -> {path}")return pathexcept Exception as e:raise HTTPException(status_code=400, detail=f"Invalid data URI: {e}")# 2) 本地文件if _is_local_like(url):p = _to_local_path(url)if not os.path.isabs(p):p = os.path.abspath(p)if not os.path.isfile(p):raise HTTPException(status_code=400, detail=f"Local file not found or not a file: {p}")ext = os.path.splitext(p)[1] or ".img"try:with open(p, "rb") as f:data = f.read()except Exception as e:raise HTTPException(status_code=400, detail=f"Read local file failed: {p} ({e})")path = _save_bytes_to_temp(data, suffix=ext)log.info(f"[image] local -> {p} -> {path}")return path# 3) http(s)try:log.info(f"[image] http(s) -> {url}")resp = requests.get(url, timeout=30)resp.raise_for_status()ctype = resp.headers.get("Content-Type", "")ext = mimetypes.guess_extension(ctype) or ".img"path = _save_bytes_to_temp(resp.content, suffix=ext)log.info(f"[image] http(s) saved -> {path}")return pathexcept Exception as e:raise HTTPException(status_code=400, detail=f"Download image failed: {e}")def _extract_text_and_first_image_from_messages(messages: List[Dict[str, Any]]) -> Tuple[str, Optional[str]]:"""解析 OpenAI chat.completions 风格 messages:- content: str(文本)- content: list[{type:"text"}, {type:"image_url", image_url:{url:...}|str}]仅取第一张图片;文本拼接所有 text 片段。"""all_text: List[str] = []image_path: Optional[str] = Nonefor msg in messages:content = msg.get("content")if content is None:continueif isinstance(content, str):all_text.append(content)continueif isinstance(content, list):for part in content:ptype = part.get("type")if ptype in ("text", "input_text"):txt = part.get("text", "")if isinstance(txt, str) and txt.strip():all_text.append(txt)elif ptype in ("image_url", "input_image"):if image_path is None:image_field = part.get("image_url") or part.get("image")url = image_field.get("url") if isinstance(image_field, dict) else image_fieldif not url or not isinstance(url, str):raise HTTPException(status_code=400, detail="image_url is missing or invalid")image_path = _download_to_temp(url)prompt = "\n".join([t for t in all_text if t.strip()]) if all_text else ""return prompt, image_pathdef _run_ocr_infer(prompt: str, image_path: str) -> str:full_prompt = f"<image>\n{prompt}".strip()try:res = model.infer(tokenizer,prompt=full_prompt,image_file=image_path,output_path="./save",base_size=1024,image_size=640,crop_mode=True,save_results=False,test_compress=True,eval_mode=True,)except Exception as e:raise HTTPException(status_code=500, detail=f"Infer failed: {e}")if isinstance(res, dict):for key in ("text", "result", "output", "ocr_text"):if key in res and isinstance(res[key], str):return res[key]return str(res)if isinstance(res, (list, tuple)):return "\n".join(map(str, res))return str(res)def _token_count_approx(text: str) -> int:try:return len(tokenizer.encode(text))except Exception:return max(1, len(text) // 4)# ---------------- routes ----------------
@app.get("/health")
async def health_check():return {"status": "healthy"}@app.post("/parserToText")
async def parser_to_text(file: UploadFile = File(...), content: str = Form(...)):file_bytes = await file.read()suffix = os.path.splitext(file.filename or "")[1] or ".img"tmp_path = _save_bytes_to_temp(file_bytes, suffix=suffix)prompt = "<image>\n" + (content or "")try:res = model.infer(tokenizer,prompt=prompt,image_file=tmp_path,output_path="./save",base_size=1024,image_size=640,crop_mode=True,save_results=False,test_compress=True,eval_mode=True,)return resexcept Exception as e:return {"status": "error", "message": str(e)}finally:if os.path.exists(tmp_path):try: os.unlink(tmp_path)except Exception: pass@app.get("/v1/models")
async def list_models():return {"object": "list","data": [{"id": OPENAI_MODEL_ID, "object": "model", "created": _now_ts(), "owned_by": "owner"}],}@app.post("/v1/chat/completions")
async def chat_completions(request: Request):payload = await request.json()messages = payload.get("messages")if not isinstance(messages, list) or not messages:raise HTTPException(status_code=400, detail="`messages` must be a non-empty list")prompt_text, image_path = _extract_text_and_first_image_from_messages(messages)if not image_path:raise HTTPException(status_code=400, detail="No image found in messages. Provide content with type='image_url'.")try:answer = _run_ocr_infer(prompt_text, image_path)finally:if image_path and os.path.exists(image_path):try: os.unlink(image_path)except Exception: passprompt_tokens = _token_count_approx(prompt_text)completion_tokens = _token_count_approx(answer)return JSONResponse({"id": _gen_id("chatcmpl"),"object": "chat.completion","created": _now_ts(),"model": OPENAI_MODEL_ID,"choices": [{"index": 0, "message": {"role": "assistant", "content": answer}, "finish_reason": "stop"}],"usage": {"prompt_tokens": prompt_tokens, "completion_tokens": completion_tokens,"total_tokens": prompt_tokens + completion_tokens},})# 根路由
@app.get("/")
async def root():return {"service": "OpenAI-Compatible OCR Service", "model": OPENAI_MODEL_ID, "ui": "/static/ui.html"}# ---------------- main ----------------
if __name__ == "__main__":import uvicorn# 监听 0.0.0.0:8001uvicorn.run(app, host="0.0.0.0", port=8001)

(2)Python 客户端调用示例(OpenAI SDK)

from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8001/v1", api_key="sk-x")resp = client.chat.completions.create(model="deepseek-ocr",messages=[{"role": "user","content": [{"type": "text", "text": "描述一下图片内容:"},{"type": "image_url", "image_url": {"url": "/home/qwt/projects/model_apis/test.png"}},# 也可用 "file:///home/qwt/images/test.png" 或相对路径 "test.png"],}],
)
print(resp.choices[0].message.content)

(3)前端单页:static/ui.html

<!doctype html>
<html lang="zh">
<head>
<meta charset="utf-8">
<title>DeepSeek-OCR • Web UI</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>:root { --bg:#0b1220; --fg:#e6edf3; --muted:#9aa4b2; --acc:#49b5ff; --card:#111a2e; --ok:#2ecc71; --err:#ff6b6b; }* { box-sizing: border-box; }body { margin:0; font-family: ui-sans-serif, system-ui, -apple-system, Segoe UI, Roboto, "Helvetica Neue", Arial; background:var(--bg); color:var(--fg); }.wrap { max-width: 1000px; margin: 32px auto; padding: 0 16px; }h1 { font-weight:700; margin: 0 0 6px; }p.desc { color:var(--muted); margin: 0 0 16px; }.card { background:var(--card); border-radius:16px; padding:16px; box-shadow: 0 10px 30px rgba(0,0,0,.25); margin-bottom:16px; }.row { display:flex; gap:16px; flex-wrap:wrap; }.col { flex:1 1 360px; min-width:320px; }label { font-size:14px; color:var(--muted); display:block; margin:6px 0; }input[type="text"], textarea, select {width:100%; background:#0e1627; color:var(--fg); border:1px solid #1e2b44; border-radius:12px;padding:10px 12px; outline:none; font-size:14px;}textarea { min-height:120px; resize:vertical; }.btn { background:var(--acc); color:#001224; border:none; border-radius:12px; padding:10px 16px; font-weight:700; cursor:pointer; }.btn:disabled { opacity:.6; cursor:not-allowed; }.pill { display:inline-block; background:#0e1627; border:1px dashed #1e2b44; color:var(--muted); border-radius:999px; padding:6px 10px; font-size:12px; }#preview { max-width:100%; max-height:260px; border-radius:12px; border:1px solid #1e2b44; display:none; margin-top:8px; }.out { white-space:pre-wrap; background:#0e1627; border:1px solid #1e2b44; border-radius:12px; padding:12px; min-height:140px; }.tabs { display:flex; gap:8px; margin-top:8px; }.tabs button { background:#0e1627; color:var(--muted); border:1px solid #1e2b44; border-radius:10px; padding:6px 10px; cursor:pointer; }.tabs button.active { color:var(--fg); border-color:var(--acc); }a { color: var(--acc); text-decoration: none; }.row-compact { display:flex; gap:8px; align-items:center; flex-wrap:wrap; }.muted { color:var(--muted); font-size:12px; }
</style>
</head>
<body>
<div class="wrap"><h1>DeepSeek-OCR Web UI</h1><p class="desc">上传图片 + 输入提示,直接调用后端 <code>/v1/chat/completions</code>。默认预设:<span class="pill">返回 Markdown 识别结果</span></p><div class="card"><div class="row"><div class="col"><label>图片文件</label><input id="file" type="file" accept="image/*"><img id="preview" alt="preview"><div class="muted" style="margin-top:6px;">前端会把图片转为 <code>data:</code> Base64 发送到后端。</div></div><div class="col"><label>预设指令</label><select id="preset"><option value="md" selected>返回 Markdown 识别结果(保留标题/列表/表格/代码块)</option><option value="plain">返回纯文本(仅文字内容,去版式)</option><option value="json">返回 JSON 结构:{title, paragraphs, tables[], figures[]}</option></select><label style="margin-top:10px;">自定义提示(可选,会拼接到预设后面)</label><textarea id="prompt" placeholder="例如:表格务必用标准 Markdown 表格语法;公式用 $...$;图片题注前缀用 Figure:"></textarea><div class="row-compact" style="margin-top:10px;"><button id="run" class="btn">识别并生成</button><span id="status" class="pill">就绪</span></div><div class="muted" style="margin-top:6px;">接口地址:<code id="ep">/v1/chat/completions</code>(同源部署可直接使用)</div></div></div></div><div class="card"><div class="tabs"><button id="tab-raw" class="active">原始文本</button><button id="tab-md">Markdown 预览</button></div><div id="raw" class="out" style="margin-top:8px;"></div><div id="md" class="out" style="margin-top:8px; display:none;"></div></div><div class="muted">API: <a href="/v1/models" target="_blank">/v1/models</a> · <a href="/health" target="_blank">/health</a></div>
</div><!-- Markdown 渲染(CDN,可离线移除,届时不显示预览) -->
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
<script>// ======= DOM =======const fileEl   = document.getElementById('file');const preview  = document.getElementById('preview');const presetEl = document.getElementById('preset');const promptEl = document.getElementById('prompt');const runBtn   = document.getElementById('run');const statusEl = document.getElementById('status');const rawEl    = document.getElementById('raw');const mdEl     = document.getElementById('md');const tabRaw   = document.getElementById('tab-raw');const tabMd    = document.getElementById('tab-md');// ======= Helpers =======function endpoint() {// 同源部署:直接相对路径return '/v1/chat/completions';}function presetText(key) {if (key === 'plain') {return "请输出纯文本的 OCR 结果,仅保留文字内容,去掉所有版式与装饰符号。";} else if (key === 'json') {return "请以 JSON 返回 OCR 结果,字段为 {title, paragraphs, tables: [markdown_table], figures: [caption]},不要解释说明。";}// 默认 mdreturn "请以 Markdown 返回 OCR 结果,尽量还原版式:使用 # 标题、无序/有序列表、```代码块、表格用标准 Markdown 表格语法;无法识别的片段用 [UNCERTAIN] 标记。";}function fileToDataURI(file) {return new Promise((resolve, reject) => {const reader = new FileReader();reader.onerror = () => reject(new Error('读取文件失败'));reader.onload = () => resolve(reader.result);reader.readAsDataURL(file);});}function setTab(which) {if (which === 'raw') {tabRaw.classList.add('active'); tabMd.classList.remove('active');rawEl.style.display = 'block'; mdEl.style.display = 'none';} else {tabMd.classList.add('active'); tabRaw.classList.remove('active');mdEl.style.display = 'block'; rawEl.style.display = 'none';}}function setStatus(text, ok=true) {statusEl.textContent = text;statusEl.style.borderColor = ok ? '#1e2b44' : 'var(--err)';statusEl.style.color = ok ? 'var(--muted)' : '#ffdede';}// ======= UI Events =======fileEl.addEventListener('change', () => {const f = fileEl.files && fileEl.files[0];if (!f) { preview.style.display = 'none'; return; }const url = URL.createObjectURL(f);preview.src = url;preview.style.display = 'block';});tabRaw.onclick = () => setTab('raw');tabMd.onclick  = () => setTab('md');runBtn.addEventListener('click', async () => {try {const f = fileEl.files && fileEl.files[0];if (!f) { alert('请先选择图片文件'); return; }const dataUri = await fileToDataURI(f);const preset  = presetText(presetEl.value);const custom  = (promptEl.value || '').trim();const textMsg = custom ? (preset + "\n\n" + custom) : preset;const body = {model: "deepseek-ocr",messages: [{role: "user",content: [{ type: "text", text: textMsg },{ type: "image_url", image_url: { url: dataUri } }]}],// 下面可以传其它 openai 风格参数(当前后端不使用)// temperature: 0.2};setStatus('识别中…', true);runBtn.disabled = true;rawEl.textContent = '';mdEl.textContent = '';const t0 = performance.now();const resp = await fetch(endpoint(), {method: 'POST',headers: { 'Content-Type': 'application/json' },body: JSON.stringify(body),});const t1 = performance.now();if (!resp.ok) {const errText = await resp.text();setStatus('出错', false);rawEl.textContent = `HTTP ${resp.status}\n${errText}`;setTab('raw');return;}const json = await resp.json();const content = json?.choices?.[0]?.message?.content ?? '';rawEl.textContent = content || '[空响应]';if (window.marked && content) {mdEl.innerHTML = marked.parse(content);} else {mdEl.textContent = content;}setStatus(`完成(${((t1 - t0)/1000).toFixed(2)}s)`, true);} catch (e) {setStatus('出错', false);rawEl.textContent = String(e?.stack || e);setTab('raw');} finally {runBtn.disabled = false;}});
</script>
</body>
</html>
http://www.dtcms.com/a/523516.html

相关文章:

  • 企业内部SRE/DevOps向精通Linux课程培训大纲
  • 《Effective Java》第13条:谨慎的覆盖clone
  • 第一章、React + TypeScript + Webpack项目构建
  • 前端:金丝雀部署(Canary Deployment)/ A、B部署 / 灰度部署
  • Spark微博舆情分析系统 情感分析 爬虫 Hadoop和Hive 贴吧数据 双平台 讲解视频 大数据 Hadoop ✅
  • 宁波公司网站建设价格dw建设手机网站
  • 长沙做网站价格有哪些网站可以做青旅义工
  • 怎么查看网站打开速度企业网站用vps还是虚拟主机
  • Vue3 模板引用——ref
  • XGBoost完整学习指南:从数据清洗到模型调参
  • 【深度学习新浪潮】AI Agent工具流产品:从原理到落地,打造智能协作新范式
  • 页面滚动加载更多
  • 除了provide和inject,Vue3还有哪些用于组件通信的方式?
  • React 表单与事件
  • AI代码开发宝库系列:FAISS向量数据库
  • 前端埋点学习
  • Spring AI与DeepSeek实战:打造企业级知识库+系统API调用
  • 秦皇岛市建设局网站关于装配式专家做运动特卖的网站
  • j2ee 建设简单网站设计婚纱网站
  • C++类和对象(中):const 成员函数与取地址运算符重载
  • 数据结构 散列表—— 冲突解决方法
  • 箭头函数和普通函数有什么区别
  • Spring Boot 缓存知识体系大纲
  • 破局政务数字化核心难题:金仓数据库以国产化方案引领电子证照系统升级之路
  • XML:从基础到 Schema 约束的全方位解析
  • 技术引领场景革新|合合信息PRCV论坛聚焦多模态文本智能前沿实践
  • 海南网站建设网络货运平台有哪些
  • 系统架构设计师备考第53天——业务逻辑层设计
  • 科技创新与数字化制造转型在“十五五”规划中的意义
  • 网站开发最新技术wordpress4.7.4密码