当前位置: 首页 > news >正文

Tess-two - Tess-two 文字识别(Tess-two 概述、Tess-two 文字识别、补充情况)

一、Tess-two 概述

  1. Tess-two 是 Tesseract OCR 引擎在 Android 平台上的一个封装库,用于实现离线文字识别

  2. Tess-two 的 GitHub 官网:https://github.com/rmtheis/tess-two


二、Tess-two 文字识别

1、演示
(1)Dependencies
  • 模块级 build.gradle
implementation 'com.rmtheis:tess-two:9.1.0'
(2)Tessdata
  1. 从 Tessdata 仓库 https://github.com/tesseract-ocr/tessdata 下载所需语言包

  2. 例如,eng.traineddata 用于英文、chi_sim.traineddata 用于简体中文

  3. 将下载的 .traineddata 文件放在项目的 src/main/assets 目录下

(3)Manifest
  • AndroidManifest.xml
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
(4)Test
  • MainActivity.java
public class MainActivity extends AppCompatActivity {public static final String TAG = MainActivity.class.getSimpleName();@Overrideprotected void onCreate(Bundle savedInstanceState) {super.onCreate(savedInstanceState);EdgeToEdge.enable(this);setContentView(R.layout.activity_main);ViewCompat.setOnApplyWindowInsetsListener(findViewById(R.id.main), (v, insets) -> {Insets systemBars = insets.getInsets(WindowInsetsCompat.Type.systemBars());v.setPadding(systemBars.left, systemBars.top, systemBars.right, systemBars.bottom);return insets;});if (checkSelfPermission(Manifest.permission.READ_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED|| checkSelfPermission(Manifest.permission.WRITE_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED) {registerForActivityResult(new ActivityResultContracts.RequestMultiplePermissions(),o -> {for (Map.Entry<String, Boolean> entry : o.entrySet()) {Log.i(TAG, entry.getKey() + " : " + entry.getValue());}boolean allGranted = true;for (Map.Entry<String, Boolean> entry : o.entrySet()) {if (!entry.getValue()) {allGranted = false;break;}}if (allGranted) {test();} else {Log.i(TAG, "权限未全部授予");}}).launch(new String[]{Manifest.permission.READ_EXTERNAL_STORAGE,Manifest.permission.WRITE_EXTERNAL_STORAGE});} else {test();}}private void test() {copyTessDataToStorage("chi_sim.traineddata", "eng.traineddata");TessBaseAPI tessBaseAPI = new TessBaseAPI();String tesseractDirPath = getExternalFilesDir(null) + "/tesseract/";boolean initResult = tessBaseAPI.init(tesseractDirPath, "chi_sim+eng");if (!initResult) {Log.i(TAG, "初始化 Tesseract 失败");return;}Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.test_img);tessBaseAPI.setImage(bitmap);String result = tessBaseAPI.getUTF8Text();Log.i(TAG, "result: " + result);}public void copyTessDataToStorage(String... tessDataFiles) {String tessDataDirPath = getExternalFilesDir(null) + "/tesseract/tessdata/";File tessDataDir = new File(tessDataDirPath);if (!tessDataDir.exists()) {tessDataDir.mkdirs();}AssetManager assetManager = getAssets();for (String fileName : tessDataFiles) {File outFile = new File(tessDataDirPath + fileName);if (outFile.exists()) continue;try (InputStream in = assetManager.open(fileName);OutputStream out = new FileOutputStream(outFile)) {byte[] buffer = new byte[1024];int read;while ((read = in.read(buffer)) != -1) {out.write(buffer, 0, read);}} catch (IOException e) {e.printStackTrace();}}}
}
# 输出结果result: 张 三
2、解读
(1)请求权限
  1. 通过 checkSelfPermission 方法检查是否已有权限,如果已有权限,执行测试代码

  2. 如果没有权限,则使用 Activity Result API 请求权限

  3. 请求完成后,检查所有权限是否都被授予,如果都被授予,执行测试代码

// 检查是否已有权限
if (checkSelfPermission(Manifest.permission.READ_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED|| checkSelfPermission(Manifest.permission.WRITE_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED) {// 如果没有权限,请求权限registerForActivityResult(new ActivityResultContracts.RequestMultiplePermissions(),o -> {for (Map.Entry<String, Boolean> entry : o.entrySet()) {Log.i(TAG, entry.getKey() + " : " + entry.getValue());}boolean allGranted = true;for (Map.Entry<String, Boolean> entry : o.entrySet()) {if (!entry.getValue()) {allGranted = false;break;}}// 检查所有权限是否都被授予if (allGranted) {// 如果都被授予,执行测试代码test();} else {Log.i(TAG, "权限未全部授予");}}).launch(new String[]{Manifest.permission.READ_EXTERNAL_STORAGE,Manifest.permission.WRITE_EXTERNAL_STORAGE});
} else {// 如果已有权限,执行测试代码test();
}
(2)复制 Tessdata
  • src/main/assets 目录复制 .traineddata 文件到应用私有存储外部目录的 files/tesseract/tessdata/ 目录
public void copyTessDataToStorage(String... tessDataFiles) {// 创建目标目录String tessDataDirPath = getExternalFilesDir(null) + "/tesseract/tessdata/";File tessDataDir = new File(tessDataDirPath);if (!tessDataDir.exists()) {tessDataDir.mkdirs();}AssetManager assetManager = getAssets();for (String fileName : tessDataFiles) {File outFile = new File(tessDataDirPath + fileName);if (outFile.exists()) continue; // 如果文件已存在则跳过try (InputStream in = assetManager.open(fileName);OutputStream out = new FileOutputStream(outFile)) {byte[] buffer = new byte[1024];int read;while ((read = in.read(buffer)) != -1) {out.write(buffer, 0, read);}} catch (IOException e) {e.printStackTrace();}}
}
(3)初始化与识别
  • 调用 init 方法初始化 Tesseract
  1. 第一个参数是包含 Tessdata 目录的父目录,Tessdata 在 files/tesseract/tessdata/ 目录,那么这里就是 files/tesseract/

  2. 第二个参数是语言代码,多个可以用加号 + 连接,chi_sim+eng 表示识别中文和英文

TessBaseAPI tessBaseAPI = new TessBaseAPI();String tesseractDirPath = getExternalFilesDir(null) + "/tesseract/";boolean initResult = tessBaseAPI.init(tesseractDirPath, "chi_sim+eng");
if (!initResult) {Log.i(TAG, "初始化 Tesseract 失败");return;
}
  • 调用 setImage 方法识别,调用 getUTF8Text 获取识别结果
Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.test_img);tessBaseAPI.setImage(bitmap);String result = tessBaseAPI.getUTF8Text();Log.i(TAG, "result: " + result);

三、补充情况

1、Bitmap 获取失败的情况
  • 这里从一个不存在的资源文件获取 Bitmap
Bitmap bitmap = BitmapFactory.decodeResource(getResources(), 1001);Log.i(TAG, "bitmap: " + bitmap);tessBaseAPI.setImage(bitmap);String result = tessBaseAPI.getUTF8Text();Log.i(TAG, "result: " + result);
# 输出结果bitmap: null
...
FATAL EXCEPTION: main
Process: com.my.ocr_tesseract, PID: 25149
java.lang.RuntimeException: Unable to start activity ComponentInfo{com.my.ocr_tesseract/com.my.ocr_tesseract.MainActivity}: java.lang.RuntimeException: Failed to read bitmap
2、识别连笔字
  • Tess-two 文字识别,识别连笔字的能力有限,推荐使用 ML Kit 数字墨水识别
# 输出结果result: 
# 输出结果result: 锄
3、使用应用私有存储内部目录
  • 也可以使用应用私有存储内部目录,这样也不需要请求权限
private void test() {copyTessDataToStorage("chi_sim.traineddata", "eng.traineddata");TessBaseAPI tessBaseAPI = new TessBaseAPI();String tesseractDirPath = getFilesDir() + "/tesseract/";boolean initResult = tessBaseAPI.init(tesseractDirPath, "chi_sim+eng");if (!initResult) {Log.i(TAG, "初始化 Tesseract 失败");return;}Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.test_img);Log.i(TAG, "bitmap: " + bitmap);tessBaseAPI.setImage(bitmap);String result = tessBaseAPI.getUTF8Text();Log.i(TAG, "result: " + result);
}public void copyTessDataToStorage(String... tessDataFiles) {String tessDataDirPath = getFilesDir() + "/tesseract/tessdata/";File tessDataDir = new File(tessDataDirPath);if (!tessDataDir.exists()) {tessDataDir.mkdirs();}AssetManager assetManager = getAssets();for (String fileName : tessDataFiles) {File outFile = new File(tessDataDirPath + fileName);if (outFile.exists()) continue;try (InputStream in = assetManager.open(fileName);OutputStream out = new FileOutputStream(outFile)) {byte[] buffer = new byte[1024];int read;while ((read = in.read(buffer)) != -1) {out.write(buffer, 0, read);}} catch (IOException e) {e.printStackTrace();}}
}

文章转载自:

http://OEuRgQ3A.ysckr.cn
http://CWuZwnrH.ysckr.cn
http://TIUV1oCk.ysckr.cn
http://Eyi5Mz8l.ysckr.cn
http://zcGg5qSX.ysckr.cn
http://ztGD4YAd.ysckr.cn
http://kG1LN2QY.ysckr.cn
http://JUdlmG1b.ysckr.cn
http://44aoJRl6.ysckr.cn
http://2lELQnQe.ysckr.cn
http://UAIIfLN2.ysckr.cn
http://MITF9pUQ.ysckr.cn
http://AGYwzSsr.ysckr.cn
http://uPXWukbp.ysckr.cn
http://Et3ct1p4.ysckr.cn
http://AT8SoEoZ.ysckr.cn
http://1vpvbdX2.ysckr.cn
http://FNicfBCA.ysckr.cn
http://qx6a3Azg.ysckr.cn
http://QA2jjdZz.ysckr.cn
http://WobGqBME.ysckr.cn
http://7vtuyhrN.ysckr.cn
http://80yeFaUO.ysckr.cn
http://xGA4oXEz.ysckr.cn
http://M8bcDXdh.ysckr.cn
http://l0aN153k.ysckr.cn
http://EpV7iDxa.ysckr.cn
http://5T8dhgoQ.ysckr.cn
http://jF5pyUzh.ysckr.cn
http://fw6ZwbSh.ysckr.cn
http://www.dtcms.com/a/375868.html

相关文章:

  • hot100 之移动零-283(双指针)
  • APP隐私合规评估测试核心要点与第三方APP检测全流程解析
  • ARM汇编与栈操作指南
  • 在 Keil 中将 STM32 工程下载到 RAM 进行调试运行
  • 高效数据操作:详解MySQL UPDATE中的CASE条件更新与性能优化
  • 构建企业级Selenium爬虫:基于隧道代理的IP管理架构
  • Nginx限流与防爬虫与安全配置方案
  • YOLO11训练自己数据集的注意事项、技巧
  • Kafka面试精讲 Day 13:故障检测与自动恢复
  • Linux学习——管理网络安全(二十一)
  • 平衡车 -- PID
  • 【ComfyUI】Flux Krea 微调完美真实照片生成
  • dp类相关问题(1):区间dp
  • TensorFlow 2.x 核心 API 与模型构建:从入门到实践
  • 华清远见25072班网络编程学习day2
  • 【论文写作】--网络与信息安全顶刊顶会
  • 【人工智能99问】如何基于QWen3进行LoRA微调?(38/99)
  • JAVA Predicate
  • 自动驾驶中的传感器技术41——Radar(2)
  • Netty HandlerContext 和 Pipeline
  • Stuns in Singapore!中新赛克盛大亮相ISS World Asia 2025
  • 开始 ComfyUI 的 AI 绘图之旅-LoRA(五)
  • 字符函数和字符串函数 last part
  • win安装多个mysql,免安装mysql
  • 开源项目_强化学习股票预测
  • Shell 脚本基础:从语法到实战全解析
  • Nginx如何部署HTTP/3
  • 解一元三次方程
  • A股大盘数据-20250909分析
  • 05-Redis 命令行客户端(redis-cli)实操指南:从连接到返回值解析