当前位置：首页 > news >正文

【LLM】用 FastAPI 搭建「OpenAI 兼容」DeepSeek-OCR 服务 + 简洁WebUI

news 2025/10/25 6:45:48

用 FastAPI 搭建「OpenAI 兼容」DeepSeek-OCR 服务 + 简洁WebUI

用 FastAPI 搭建「OpenAI 兼容」DeepSeek-OCR 服务 + 简洁WebUI
- 1. 功能概览
- 2. 环境准备
- 3. 目录结构建议
- 4. 后端启动
- 5. 请求/响应要点（/v1/chat/completions）
- 6. 前端交互逻辑
- 7. 三个源码占位
- - （1）后端服务源码：`app.py`
  - （2）Python 客户端调用示例（OpenAI SDK）
  - （3）前端单页：`static/ui.html`

用 FastAPI 搭建「OpenAI 兼容」DeepSeek-OCR 服务 + 简洁WebUI

目标：本地部署 DeepSeek-OCR，暴露 /v1/chat/completions（OpenAI 协议兼容），并提供一个静态网页上传图片+输入提示直接获取结果。

在这里插入图片描述

1. 功能概览

后端：FastAPI，兼容 GET /v1/models、POST /v1/chat/completions、GET /health、POST /parserToText。
模型：DeepSeek-OCR（transformers，trust_remote_code=True）。
图片输入：支持 data: Base64（推荐）、本地路径/file://、http(s)。
前端：单文件 static/ui.html，可选预设（Markdown / 纯文本 / JSON 结构），带 Markdown 预览。
静态目录：/static，且提供 /ui 跳转到 static/ui.html。

2. 环境准备

Python 3.12+
建议用 Conda/venv
依赖（示例）：fastapi、uvicorn[standard]、transformers、torch、requests
可选：flash-attn（提速、省显存）

conda create -n deepseekocr python=3.12.9
conda activate deepseekocr
pip install torch==2.6.0 transformers==4.46.3 tokenizers==0.20.3 einops addict easydict python-multipart uvicorn fastapi Pillow torchvision

3. 目录结构建议

project/
├─ app.py                 # 后端服务（OpenAI 兼容 + 图片三种输入）
├─ static/
│  └─ ui.html             # 单文件前端（上传图片→data:URI→/v1/chat/completions）
└─ README.md

4. 后端启动

python app.py

监听：http://0.0.0.0:8001
健康：/health
模型：/v1/models（固定返回 deepseek-ocr）
推理：/v1/chat/completions
表单：/parserToText
前端：/ui（跳转到 /static/ui.html）

5. 请求/响应要点（/v1/chat/completions）

请求（核心）：
- model: "deepseek-ocr"
- messages: [{ role: "user", content: [ {type:"text", text:"你的提示"}, {type:"image_url", image_url:{url:"<图片地址>"}} ] }]
- <图片地址>：data: Base64（推荐） / 本地 file:///... 或绝对路径 / http(s) URL
响应：
- choices[0].message.content：模型的文本结果（可为 Markdown）

6. 前端交互逻辑

选择图片 → 读成 data: Base64（FileReader.readAsDataURL）
组装 messages（预设 + 自定义提示 + image_url）
POST /v1/chat/completions → 渲染原始文本 & Markdown 预览

7. 三个源码占位

（1）后端服务源码：`app.py`

# python 3.12+
# pip install fastapi uvicorn transformers torch requests
import os
import time
import uuid
import base64
import tempfile
import mimetypes
import logging
from typing import Any, Dict, List, Optional, Tuple
from urllib.parse import urlparseimport requests
import torch
from fastapi import FastAPI, File, UploadFile, Form, Request, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from fastapi.responses import JSONResponse, HTMLResponse
from fastapi.staticfiles import StaticFiles
from transformers import AutoModel, AutoTokenizer# ---------------- logging ----------------
logging.basicConfig(level=logging.INFO)
log = logging.getLogger("ocr-api")# ---------------- app & CORS -------------
app = FastAPI(title="Transformers模型服务 (OpenAI-Compatible)")
app.add_middleware(CORSMiddleware,allow_origins=["*"], allow_credentials=True,allow_methods=["*"], allow_headers=["*"],
)# 静态目录（用于放置你的 ui.html）
STATIC_DIR = os.getenv("STATIC_DIR", "static")
os.makedirs(STATIC_DIR, exist_ok=True)
app.mount("/static", StaticFiles(directory=STATIC_DIR), name="static")# 便捷入口：/ui -> /static/ui.html（可选）
@app.get("/ui")
async def ui_redirect():html = '<meta http-equiv="refresh" content="0; url=/static/ui.html" />'return HTMLResponse(content=html, status_code=200)# ---------------- model load -------------
os.environ.setdefault("CUDA_VISIBLE_DEVICES", "0")
MODEL_NAME = os.getenv("DEEPSEEK_OCR_PATH", "/home/qwt/models/DeepSeek-OCR")  # 或 "deepseek-ai/DeepSeek-OCR"
OPENAI_MODEL_ID = "deepseek-ocr"tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME, trust_remote_code=True)
model = AutoModel.from_pretrained(MODEL_NAME,trust_remote_code=True,use_safetensors=True,# _attn_implementation="flash_attention_2",  # 安装了 flash-attn 再打开
)# 优雅设置精度与设备：优先 bfloat16，不支持则用 float16；最后兜底 float32（CPU）
if torch.cuda.is_available():device = torch.device("cuda:0")model = model.eval().to(device)try:model = model.to(torch.bfloat16)except Exception:try:model = model.to(torch.float16)log.info("BF16 不可用，已回退到 FP16")except Exception:model = model.to(torch.float32)log.info("FP16 不可用，已回退到 FP32")
else:device = torch.device("cpu")model = model.eval().to(device)log.warning("未检测到 CUDA，将在 CPU 上推理。")# ---------------- helpers ----------------
def _now_ts() -> int:return int(time.time())def _gen_id(prefix: str) -> str:return f"{prefix}_{uuid.uuid4().hex[:24]}"def _save_bytes_to_temp(data: bytes, suffix: str = "") -> str:tmp = tempfile.NamedTemporaryFile(delete=False, suffix=suffix)tmp.write(data)tmp.flush()tmp.close()return tmp.namedef _is_data_uri(url: str) -> bool:return isinstance(url, str) and url.startswith("data:")def _is_local_like(s: str) -> bool:"""只要不是 http/https/data，就按本地处理：- 绝对/相对路径、~ 扩展、file:// 前缀、Windows 盘符等"""if not isinstance(s, str):return Falseif s.startswith("file://"):return Trueparsed = urlparse(s)if parsed.scheme in ("http", "https", "data"):return Falsereturn Truedef _to_local_path(s: str) -> str:if s.startswith("file://"):return s[7:]return os.path.expanduser(s)def _download_to_temp(url: str) -> str:"""支持三类输入：1) data:image/png;base64,xxx2) 本地文件路径（相对/绝对）或 file:// 前缀（不再依赖 exists 判定是否本地）3) http(s) URL返回：下载/复制后的临时文件路径"""if not isinstance(url, str) or not url.strip():raise HTTPException(status_code=400, detail="Empty image url")# 1) data: URIif _is_data_uri(url):try:header, b64 = url.split(",", 1)ext = ".bin"if "image/png" in header: ext = ".png"elif "image/jpeg" in header or "image/jpg" in header: ext = ".jpg"elif "image/webp" in header: ext = ".webp"raw = base64.b64decode(b64)path = _save_bytes_to_temp(raw, suffix=ext)log.info(f"[image] data-uri -> {path}")return pathexcept Exception as e:raise HTTPException(status_code=400, detail=f"Invalid data URI: {e}")# 2) 本地文件if _is_local_like(url):p = _to_local_path(url)if not os.path.isabs(p):p = os.path.abspath(p)if not os.path.isfile(p):raise HTTPException(status_code=400, detail=f"Local file not found or not a file: {p}")ext = os.path.splitext(p)[1] or ".img"try:with open(p, "rb") as f:data = f.read()except Exception as e:raise HTTPException(status_code=400, detail=f"Read local file failed: {p} ({e})")path = _save_bytes_to_temp(data, suffix=ext)log.info(f"[image] local -> {p} -> {path}")return path# 3) http(s)try:log.info(f"[image] http(s) -> {url}")resp = requests.get(url, timeout=30)resp.raise_for_status()ctype = resp.headers.get("Content-Type", "")ext = mimetypes.guess_extension(ctype) or ".img"path = _save_bytes_to_temp(resp.content, suffix=ext)log.info(f"[image] http(s) saved -> {path}")return pathexcept Exception as e:raise HTTPException(status_code=400, detail=f"Download image failed: {e}")def _extract_text_and_first_image_from_messages(messages: List[Dict[str, Any]]) -> Tuple[str, Optional[str]]:"""解析 OpenAI chat.completions 风格 messages：- content: str（文本）- content: list[{type:"text"}, {type:"image_url", image_url:{url:...}|str}]仅取第一张图片；文本拼接所有 text 片段。"""all_text: List[str] = []image_path: Optional[str] = Nonefor msg in messages:content = msg.get("content")if content is None:continueif isinstance(content, str):all_text.append(content)continueif isinstance(content, list):for part in content:ptype = part.get("type")if ptype in ("text", "input_text"):txt = part.get("text", "")if isinstance(txt, str) and txt.strip():all_text.append(txt)elif ptype in ("image_url", "input_image"):if image_path is None:image_field = part.get("image_url") or part.get("image")url = image_field.get("url") if isinstance(image_field, dict) else image_fieldif not url or not isinstance(url, str):raise HTTPException(status_code=400, detail="image_url is missing or invalid")image_path = _download_to_temp(url)prompt = "\n".join([t for t in all_text if t.strip()]) if all_text else ""return prompt, image_pathdef _run_ocr_infer(prompt: str, image_path: str) -> str:full_prompt = f"<image>\n{prompt}".strip()try:res = model.infer(tokenizer,prompt=full_prompt,image_file=image_path,output_path="./save",base_size=1024,image_size=640,crop_mode=True,save_results=False,test_compress=True,eval_mode=True,)except Exception as e:raise HTTPException(status_code=500, detail=f"Infer failed: {e}")if isinstance(res, dict):for key in ("text", "result", "output", "ocr_text"):if key in res and isinstance(res[key], str):return res[key]return str(res)if isinstance(res, (list, tuple)):return "\n".join(map(str, res))return str(res)def _token_count_approx(text: str) -> int:try:return len(tokenizer.encode(text))except Exception:return max(1, len(text) // 4)# ---------------- routes ----------------
@app.get("/health")
async def health_check():return {"status": "healthy"}@app.post("/parserToText")
async def parser_to_text(file: UploadFile = File(...), content: str = Form(...)):file_bytes = await file.read()suffix = os.path.splitext(file.filename or "")[1] or ".img"tmp_path = _save_bytes_to_temp(file_bytes, suffix=suffix)prompt = "<image>\n" + (content or "")try:res = model.infer(tokenizer,prompt=prompt,image_file=tmp_path,output_path="./save",base_size=1024,image_size=640,crop_mode=True,save_results=False,test_compress=True,eval_mode=True,)return resexcept Exception as e:return {"status": "error", "message": str(e)}finally:if os.path.exists(tmp_path):try: os.unlink(tmp_path)except Exception: pass@app.get("/v1/models")
async def list_models():return {"object": "list","data": [{"id": OPENAI_MODEL_ID, "object": "model", "created": _now_ts(), "owned_by": "owner"}],}@app.post("/v1/chat/completions")
async def chat_completions(request: Request):payload = await request.json()messages = payload.get("messages")if not isinstance(messages, list) or not messages:raise HTTPException(status_code=400, detail="`messages` must be a non-empty list")prompt_text, image_path = _extract_text_and_first_image_from_messages(messages)if not image_path:raise HTTPException(status_code=400, detail="No image found in messages. Provide content with type='image_url'.")try:answer = _run_ocr_infer(prompt_text, image_path)finally:if image_path and os.path.exists(image_path):try: os.unlink(image_path)except Exception: passprompt_tokens = _token_count_approx(prompt_text)completion_tokens = _token_count_approx(answer)return JSONResponse({"id": _gen_id("chatcmpl"),"object": "chat.completion","created": _now_ts(),"model": OPENAI_MODEL_ID,"choices": [{"index": 0, "message": {"role": "assistant", "content": answer}, "finish_reason": "stop"}],"usage": {"prompt_tokens": prompt_tokens, "completion_tokens": completion_tokens,"total_tokens": prompt_tokens + completion_tokens},})# 根路由
@app.get("/")
async def root():return {"service": "OpenAI-Compatible OCR Service", "model": OPENAI_MODEL_ID, "ui": "/static/ui.html"}# ---------------- main ----------------
if __name__ == "__main__":import uvicorn# 监听 0.0.0.0:8001uvicorn.run(app, host="0.0.0.0", port=8001)

（2）Python 客户端调用示例（OpenAI SDK）

from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8001/v1", api_key="sk-x")resp = client.chat.completions.create(model="deepseek-ocr",messages=[{"role": "user","content": [{"type": "text", "text": "描述一下图片内容："},{"type": "image_url", "image_url": {"url": "/home/qwt/projects/model_apis/test.png"}},# 也可用 "file:///home/qwt/images/test.png" 或相对路径 "test.png"],}],
)
print(resp.choices[0].message.content)

（3）前端单页：`static/ui.html`

<!doctype html>
<html lang="zh">
<head>
<meta charset="utf-8">
<title>DeepSeek-OCR • Web UI</title>
<meta name="viewport" content="width=device-width, initial-scale=1">
<style>:root { --bg:#0b1220; --fg:#e6edf3; --muted:#9aa4b2; --acc:#49b5ff; --card:#111a2e; --ok:#2ecc71; --err:#ff6b6b; }* { box-sizing: border-box; }body { margin:0; font-family: ui-sans-serif, system-ui, -apple-system, Segoe UI, Roboto, "Helvetica Neue", Arial; background:var(--bg); color:var(--fg); }.wrap { max-width: 1000px; margin: 32px auto; padding: 0 16px; }h1 { font-weight:700; margin: 0 0 6px; }p.desc { color:var(--muted); margin: 0 0 16px; }.card { background:var(--card); border-radius:16px; padding:16px; box-shadow: 0 10px 30px rgba(0,0,0,.25); margin-bottom:16px; }.row { display:flex; gap:16px; flex-wrap:wrap; }.col { flex:1 1 360px; min-width:320px; }label { font-size:14px; color:var(--muted); display:block; margin:6px 0; }input[type="text"], textarea, select {width:100%; background:#0e1627; color:var(--fg); border:1px solid #1e2b44; border-radius:12px;padding:10px 12px; outline:none; font-size:14px;}textarea { min-height:120px; resize:vertical; }.btn { background:var(--acc); color:#001224; border:none; border-radius:12px; padding:10px 16px; font-weight:700; cursor:pointer; }.btn:disabled { opacity:.6; cursor:not-allowed; }.pill { display:inline-block; background:#0e1627; border:1px dashed #1e2b44; color:var(--muted); border-radius:999px; padding:6px 10px; font-size:12px; }#preview { max-width:100%; max-height:260px; border-radius:12px; border:1px solid #1e2b44; display:none; margin-top:8px; }.out { white-space:pre-wrap; background:#0e1627; border:1px solid #1e2b44; border-radius:12px; padding:12px; min-height:140px; }.tabs { display:flex; gap:8px; margin-top:8px; }.tabs button { background:#0e1627; color:var(--muted); border:1px solid #1e2b44; border-radius:10px; padding:6px 10px; cursor:pointer; }.tabs button.active { color:var(--fg); border-color:var(--acc); }a { color: var(--acc); text-decoration: none; }.row-compact { display:flex; gap:8px; align-items:center; flex-wrap:wrap; }.muted { color:var(--muted); font-size:12px; }
</style>
</head>
<body>
<div class="wrap"><h1>DeepSeek-OCR Web UI</h1><p class="desc">上传图片 + 输入提示，直接调用后端 <code>/v1/chat/completions</code>。默认预设：<span class="pill">返回 Markdown 识别结果</span></p><div class="card"><div class="row"><div class="col"><label>图片文件</label><input id="file" type="file" accept="image/*"><img id="preview" alt="preview"><div class="muted" style="margin-top:6px;">前端会把图片转为 <code>data:</code> Base64 发送到后端。</div></div><div class="col"><label>预设指令</label><select id="preset"><option value="md" selected>返回 Markdown 识别结果（保留标题/列表/表格/代码块）</option><option value="plain">返回纯文本（仅文字内容，去版式）</option><option value="json">返回 JSON 结构：{title, paragraphs, tables[], figures[]}</option></select><label style="margin-top:10px;">自定义提示（可选，会拼接到预设后面）</label><textarea id="prompt" placeholder="例如：表格务必用标准 Markdown 表格语法；公式用 $...$；图片题注前缀用 Figure:"></textarea><div class="row-compact" style="margin-top:10px;"><button id="run" class="btn">识别并生成</button><span id="status" class="pill">就绪</span></div><div class="muted" style="margin-top:6px;">接口地址：<code id="ep">/v1/chat/completions</code>（同源部署可直接使用）</div></div></div></div><div class="card"><div class="tabs"><button id="tab-raw" class="active">原始文本</button><button id="tab-md">Markdown 预览</button></div><div id="raw" class="out" style="margin-top:8px;"></div><div id="md" class="out" style="margin-top:8px; display:none;"></div></div><div class="muted">API: <a href="/v1/models" target="_blank">/v1/models</a> · <a href="/health" target="_blank">/health</a></div>
</div><!-- Markdown 渲染（CDN，可离线移除，届时不显示预览） -->
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
<script>// ======= DOM =======const fileEl   = document.getElementById('file');const preview  = document.getElementById('preview');const presetEl = document.getElementById('preset');const promptEl = document.getElementById('prompt');const runBtn   = document.getElementById('run');const statusEl = document.getElementById('status');const rawEl    = document.getElementById('raw');const mdEl     = document.getElementById('md');const tabRaw   = document.getElementById('tab-raw');const tabMd    = document.getElementById('tab-md');// ======= Helpers =======function endpoint() {// 同源部署：直接相对路径return '/v1/chat/completions';}function presetText(key) {if (key === 'plain') {return "请输出纯文本的 OCR 结果，仅保留文字内容，去掉所有版式与装饰符号。";} else if (key === 'json') {return "请以 JSON 返回 OCR 结果，字段为 {title, paragraphs, tables: [markdown_table], figures: [caption]}，不要解释说明。";}// 默认 mdreturn "请以 Markdown 返回 OCR 结果，尽量还原版式：使用 # 标题、无序/有序列表、```代码块、表格用标准 Markdown 表格语法；无法识别的片段用 [UNCERTAIN] 标记。";}function fileToDataURI(file) {return new Promise((resolve, reject) => {const reader = new FileReader();reader.onerror = () => reject(new Error('读取文件失败'));reader.onload = () => resolve(reader.result);reader.readAsDataURL(file);});}function setTab(which) {if (which === 'raw') {tabRaw.classList.add('active'); tabMd.classList.remove('active');rawEl.style.display = 'block'; mdEl.style.display = 'none';} else {tabMd.classList.add('active'); tabRaw.classList.remove('active');mdEl.style.display = 'block'; rawEl.style.display = 'none';}}function setStatus(text, ok=true) {statusEl.textContent = text;statusEl.style.borderColor = ok ? '#1e2b44' : 'var(--err)';statusEl.style.color = ok ? 'var(--muted)' : '#ffdede';}// ======= UI Events =======fileEl.addEventListener('change', () => {const f = fileEl.files && fileEl.files[0];if (!f) { preview.style.display = 'none'; return; }const url = URL.createObjectURL(f);preview.src = url;preview.style.display = 'block';});tabRaw.onclick = () => setTab('raw');tabMd.onclick  = () => setTab('md');runBtn.addEventListener('click', async () => {try {const f = fileEl.files && fileEl.files[0];if (!f) { alert('请先选择图片文件'); return; }const dataUri = await fileToDataURI(f);const preset  = presetText(presetEl.value);const custom  = (promptEl.value || '').trim();const textMsg = custom ? (preset + "\n\n" + custom) : preset;const body = {model: "deepseek-ocr",messages: [{role: "user",content: [{ type: "text", text: textMsg },{ type: "image_url", image_url: { url: dataUri } }]}],// 下面可以传其它 openai 风格参数（当前后端不使用）// temperature: 0.2};setStatus('识别中…', true);runBtn.disabled = true;rawEl.textContent = '';mdEl.textContent = '';const t0 = performance.now();const resp = await fetch(endpoint(), {method: 'POST',headers: { 'Content-Type': 'application/json' },body: JSON.stringify(body),});const t1 = performance.now();if (!resp.ok) {const errText = await resp.text();setStatus('出错', false);rawEl.textContent = `HTTP ${resp.status}\n${errText}`;setTab('raw');return;}const json = await resp.json();const content = json?.choices?.[0]?.message?.content ?? '';rawEl.textContent = content || '[空响应]';if (window.marked && content) {mdEl.innerHTML = marked.parse(content);} else {mdEl.textContent = content;}setStatus(`完成（${((t1 - t0)/1000).toFixed(2)}s）`, true);} catch (e) {setStatus('出错', false);rawEl.textContent = String(e?.stack || e);setTab('raw');} finally {runBtn.disabled = false;}});
</script>
</body>
</html>