当前位置：首页 > news >正文

LaTeX OCR - 数学公式识别系统

news 2025/10/14 21:42:56

文章目录

- 一、关于 LaTeX OCR
- - 1、项目概览
  - 架构图
  - 2、相关链接资源
  - 3、功能特性
- 二、安装配置
- - 基础环境要求
  - Linux 安装
  - Mac 安装
- 三、使用指南
- - 1、快速训练（小数据集）
  - 2、完整训练（大数据集）
- 四、可视化功能
- - 训练过程可视化
  - 预测过程可视化
- 五、模型评估
- 六、技术细节
- - 数据处理流程
  - 模型架构

一、关于 LaTeX OCR

1、项目概览

基于 Seq2Seq + Attention + Beam Search 架构的数学公式识别系统，可将数学公式图片转换为 LaTeX 代码。

在这里插入图片描述

架构图

在这里插入图片描述

2、相关链接资源

GitHub：https://github.com/LinXueyuanStdio/LaTeX_OCR
增强版：https://github.com/LinXueyuanStdio/LaTeX_OCR_PRO
数据集来源：im2latex-100k , arXiv:1609.04938
参考论文：
- Show, Attend and Tell
- Harvard’s paper and dataset
- Seq2Seq for LaTeX generation

3、功能特性

1、多平台支持

支持 Linux/Mac/Windows 系统
提供一键安装脚本

2、可视化训练

集成 TensorBoard 训练过程可视化
支持注意力机制可视化

3、评估指标

支持 perplexity/EditDistance/BLEU-4/ExactMatchScore 四种评估指标

二、安装配置

基础环境要求

Python 3.5 + TensorFlow 1.12.2
LaTeX (latex 转 pdf)
Ghostscript (图片处理)
ImageMagick (pdf 转 png)

Linux 安装

一键安装

make install-linux

或分步安装

# 创建环境 
virtualenv env35 --python=python3.5
source env35/bin/activate
pip install -r requirements.txt# 安装 latex (latex 转 pdf)
sudo apt-get install texlive-latex-base texlive-latex-extra# 安装 ghostscript
sudo apt-get update && sudo apt-get install ghostscript libgs-dev# 安装 magick (pdf 转 png)
wget http://www.imagemagick.org/download/ImageMagick.tar.gz
tar -xvf ImageMagick.tar.gz
cd ImageMagick-7.*
./configure --with-gslib=yes
make
sudo make install
sudo ldconfig /usr/local/lib
rm ImageMagick.tar.gz
rm -r ImageMagick-7.*

Mac 安装

一键安装

make install-mac

分步安装

sudo pip install -r requirements.txt
wget http://www.imagemagick.org/download/ImageMagick.tar.gz
tar -xvf ImageMagick.tar.gz
cd ImageMagick-7.*
./configure --with-gslib=yes
make
sudo make install
rm ImageMagick.tar.gz
rm -r ImageMagick-7.*

三、使用指南

1、快速训练（小数据集）

一键训练（约2分钟）

make small

分步执行

python build.py --data=configs/data_small.json --vocab=configs/vocab_small.json
python train.py --data=configs/data_small.json --vocab=configs/vocab_small.json --training=configs/training_small.json --model=configs/model.json --output=results/small/
python evaluate_txt.py --results=results/small/
python evaluate_img.py --results=results/small/

2、完整训练（大数据集）

一键训练（2-3小时）

make full

分步执行

python build.py --data=configs/data.json --vocab=configs/vocab.json
python train.py --data=configs/data.json --vocab=configs/vocab.json --training=configs/training.json --model=configs/model.json --output=results/full/
python evaluate_txt.py --results=results/full/
python evaluate_img.py --results=results/full/

四、可视化功能

训练过程可视化

# 小数据集
cd results/small
tensorboard --logdir ./# 大数据集
cd results/full
tensorboard --logdir ./

预测过程可视化

python visualize_attention.py --image=data/images_test/6.png --vocab=configs/vocab.json --model=configs/model.json --output=results/full/

五、模型评估

指标	训练分数	测试分数
perplexity	1.39	1.44
EditDistance	81.68	80.45
BLEU-4	78.21	75.42
ExactMatchScore	13.93	12.44

六、技术细节

数据处理流程

获取 LaTeX 公式数据
公式规范化处理
生成图片数据集
构建字典和映射文件

模型架构

Encoder: CNN
Decoder: LSTM/GRU
注意力机制层
Beam Search/Greedy 输出策略

伊织 xAI 2025-05-18（日）

查看全文

http://www.dtcms.com/a/198359.html

matlab分段函数

大模型解析：AI技术的现状、原理与应用前景

Ubuntu搭建NFS服务器的方法

【Linux】第十八章调优系统性能

面试中的线程题

系统架构设计（十二）：统一过程模型（RUP）

【设计模式】- 行为型模式2

深度解析：AWS NLB 与 ALB 在 EKS 集群中的最佳选择

HarmonyOS：应用文件访问(ArkTS)

ACL完全解析：从权限管理到网络安全的核心防线

SMT贴片加工工艺优化与效率提升

基于FPGA的电子万年历系统开发,包含各模块testbench

开启健康生活的多元养生之道

现代生活健康养生新视角

科学养生指南：解锁健康生活密码

Selenium-Java版（frame切换/窗口切换）

医学影像开发的开源生态与技术实践：从DCMTK到DICOMweb的全面探索

Spring3+Vue3项目中的知识点——JWT

14【高级指南】Django部署最佳实践：从开发到生产的全流程解析

【Mini 型 http 服务器】—— int get_line(int sock, char *buf, int size)；

使用AI 生成PPT 最佳实践方案对比

es聚合-词条统计

Java学习手册：服务熔断与降级

Ubuntu 18.04设置静态IP的方法（图形化操作）

Spring-Beans的生命周期的介绍

nginx模块使用、过滤器模块以及handler模块

Linux 文件(1)

用golang实现二叉搜索树（BST）

飞帆控件：on_post_get 接口配置

YOLO12改进-模块-引入Channel Reduction Attention (CRA)模块降低模型复杂度，提升复杂场景下的目标定位与分类精度