当前位置：首页 > news >正文

Python 实现简单OCR文本识别

news 2025/11/5 8:24:01

Ubuntu系统：22.04

python版本：3.9

安装依赖库：

# 安装Tesseract引擎和开发库
sudo apt update && sudo apt install tesseract-ocr libtesseract-dev# 安装英语+中文语言包
sudo apt install tesseract-ocr-eng tesseract-ocr-chi-sim# 安装Python依赖
pip install pytesseract pillow -i https://mirrors.aliyun.com/pypi/simple

代码实现：

# 安装必要依赖（Ubuntu/Debian）
# 先执行以下终端命令：
# sudo apt update && sudo apt install tesseract-ocr libtesseract-dev
# sudo apt install tesseract-ocr-chi-sim  # 中文支持（可选）
# pip3 install pytesseract pillowfrom PIL import Image
import pytesseract
import sys
import osdef ocr_core(image_path):"""核心OCR函数:param image_path: 图片路径:return: 识别后的文本"""try:if not os.path.exists(image_path):raise FileNotFoundError(f"文件 {image_path} 不存在")img = Image.open(image_path)# 多语言识别示例（英语+中文）text = pytesseract.image_to_string(img, lang='eng+chi_sim')return text.strip() if text else "未识别到文字"except Exception as e:return f"错误: {str(e)}"if __name__ == "__main__":if len(sys.argv) > 1:image_path = sys.argv[1]else:image_path = input("请输入图片路径：").strip()print("\n识别中...")result = ocr_core(image_path)print("\n识别结果：")print("-" * 30)print(result)print("-" * 30)

下载测试图片：

# 下载测试图片（可选）
wget https://tesseract.projectnaptha.com/img/eng_bw.png -O test.png# 执行识别
python ocr_demo.py test.png

查看全文

http://www.dtcms.com/a/215112.html

jquery基础知识

关于拓展模块NotificationStyle的demo运行时报错的问题

如何清除浏览器启动hao点360

攻防世界RE-reverse_re3

深入解析 BlockingQueue：并发编程面试中的高频考点！

vue3 浮点数计算

架空线路智能云台监控系统介绍

ODBC简介

UNet 改进(28)：结合Coordinate Attention+FPN架构

字节开源 Dolphin: 通过异构锚点提示进行文档图像解析

如何在 Windows 10 PC 上获取 iPhone短信

Ubuntu的shell脚本

按键状态机

深度学习算法模型概念整理----模型量化、校准、模型蒸馏、算子、算子融合

第17章发布和部署应用程序

ArcGIS Pro 3.4 二次开发 - 几何

Powershell实现服务守护进程功能(服务意外终止则重启)

LSTM模型进行天气预测Pytorch版本

【EdgeYOLO】《EdgeYOLO: An Edge-Real-Time Object Detector》

Kubernetes Service 类型与实例详解

阿里云国际版注册邮箱格式详解

MyBatis 拦截器的应用场景及实践

矩阵链乘法问题

Vue:axios（POST请求）

基于线性回归的短期预测

5月26日复盘-自注意力机制

如何提高 Python 代码质量

56页 @《人工智能生命体新启点》中國龍原创连载

小巧高效的目录索引生成软件

大模型的检索增强生成综述研究

相关文章：