Tess-two - Tess-two 文字识别(Tess-two 概述、Tess-two 文字识别、补充情况)
一、Tess-two 概述
-
Tess-two 是 Tesseract OCR 引擎在 Android 平台上的一个封装库,用于实现离线文字识别
-
Tess-two 的 GitHub 官网:
https://github.com/rmtheis/tess-two
二、Tess-two 文字识别
1、演示
(1)Dependencies
- 模块级 build.gradle
implementation 'com.rmtheis:tess-two:9.1.0'
(2)Tessdata
-
从 Tessdata 仓库
https://github.com/tesseract-ocr/tessdata
下载所需语言包 -
例如,
eng.traineddata
用于英文、chi_sim.traineddata
用于简体中文 -
将下载的
.traineddata
文件放在项目的src/main/assets
目录下
(3)Manifest
- AndroidManifest.xml
<uses-permission android:name="android.permission.READ_EXTERNAL_STORAGE" />
<uses-permission android:name="android.permission.WRITE_EXTERNAL_STORAGE" />
(4)Test
- MainActivity.java
public class MainActivity extends AppCompatActivity {public static final String TAG = MainActivity.class.getSimpleName();@Overrideprotected void onCreate(Bundle savedInstanceState) {super.onCreate(savedInstanceState);EdgeToEdge.enable(this);setContentView(R.layout.activity_main);ViewCompat.setOnApplyWindowInsetsListener(findViewById(R.id.main), (v, insets) -> {Insets systemBars = insets.getInsets(WindowInsetsCompat.Type.systemBars());v.setPadding(systemBars.left, systemBars.top, systemBars.right, systemBars.bottom);return insets;});if (checkSelfPermission(Manifest.permission.READ_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED|| checkSelfPermission(Manifest.permission.WRITE_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED) {registerForActivityResult(new ActivityResultContracts.RequestMultiplePermissions(),o -> {for (Map.Entry<String, Boolean> entry : o.entrySet()) {Log.i(TAG, entry.getKey() + " : " + entry.getValue());}boolean allGranted = true;for (Map.Entry<String, Boolean> entry : o.entrySet()) {if (!entry.getValue()) {allGranted = false;break;}}if (allGranted) {test();} else {Log.i(TAG, "权限未全部授予");}}).launch(new String[]{Manifest.permission.READ_EXTERNAL_STORAGE,Manifest.permission.WRITE_EXTERNAL_STORAGE});} else {test();}}private void test() {copyTessDataToStorage("chi_sim.traineddata", "eng.traineddata");TessBaseAPI tessBaseAPI = new TessBaseAPI();String tesseractDirPath = getExternalFilesDir(null) + "/tesseract/";boolean initResult = tessBaseAPI.init(tesseractDirPath, "chi_sim+eng");if (!initResult) {Log.i(TAG, "初始化 Tesseract 失败");return;}Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.test_img);tessBaseAPI.setImage(bitmap);String result = tessBaseAPI.getUTF8Text();Log.i(TAG, "result: " + result);}public void copyTessDataToStorage(String... tessDataFiles) {String tessDataDirPath = getExternalFilesDir(null) + "/tesseract/tessdata/";File tessDataDir = new File(tessDataDirPath);if (!tessDataDir.exists()) {tessDataDir.mkdirs();}AssetManager assetManager = getAssets();for (String fileName : tessDataFiles) {File outFile = new File(tessDataDirPath + fileName);if (outFile.exists()) continue;try (InputStream in = assetManager.open(fileName);OutputStream out = new FileOutputStream(outFile)) {byte[] buffer = new byte[1024];int read;while ((read = in.read(buffer)) != -1) {out.write(buffer, 0, read);}} catch (IOException e) {e.printStackTrace();}}}
}

# 输出结果result: 张 三
2、解读
(1)请求权限
-
通过 checkSelfPermission 方法检查是否已有权限,如果已有权限,执行测试代码
-
如果没有权限,则使用 Activity Result API 请求权限
-
请求完成后,检查所有权限是否都被授予,如果都被授予,执行测试代码
// 检查是否已有权限
if (checkSelfPermission(Manifest.permission.READ_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED|| checkSelfPermission(Manifest.permission.WRITE_EXTERNAL_STORAGE) != PackageManager.PERMISSION_GRANTED) {// 如果没有权限,请求权限registerForActivityResult(new ActivityResultContracts.RequestMultiplePermissions(),o -> {for (Map.Entry<String, Boolean> entry : o.entrySet()) {Log.i(TAG, entry.getKey() + " : " + entry.getValue());}boolean allGranted = true;for (Map.Entry<String, Boolean> entry : o.entrySet()) {if (!entry.getValue()) {allGranted = false;break;}}// 检查所有权限是否都被授予if (allGranted) {// 如果都被授予,执行测试代码test();} else {Log.i(TAG, "权限未全部授予");}}).launch(new String[]{Manifest.permission.READ_EXTERNAL_STORAGE,Manifest.permission.WRITE_EXTERNAL_STORAGE});
} else {// 如果已有权限,执行测试代码test();
}
(2)复制 Tessdata
- 从
src/main/assets
目录复制.traineddata
文件到应用私有存储外部目录的files/tesseract/tessdata/
目录
public void copyTessDataToStorage(String... tessDataFiles) {// 创建目标目录String tessDataDirPath = getExternalFilesDir(null) + "/tesseract/tessdata/";File tessDataDir = new File(tessDataDirPath);if (!tessDataDir.exists()) {tessDataDir.mkdirs();}AssetManager assetManager = getAssets();for (String fileName : tessDataFiles) {File outFile = new File(tessDataDirPath + fileName);if (outFile.exists()) continue; // 如果文件已存在则跳过try (InputStream in = assetManager.open(fileName);OutputStream out = new FileOutputStream(outFile)) {byte[] buffer = new byte[1024];int read;while ((read = in.read(buffer)) != -1) {out.write(buffer, 0, read);}} catch (IOException e) {e.printStackTrace();}}
}
(3)初始化与识别
- 调用 init 方法初始化 Tesseract
-
第一个参数是包含 Tessdata 目录的父目录,Tessdata 在
files/tesseract/tessdata/
目录,那么这里就是files/tesseract/
-
第二个参数是语言代码,多个可以用加号
+
连接,chi_sim+eng
表示识别中文和英文
TessBaseAPI tessBaseAPI = new TessBaseAPI();String tesseractDirPath = getExternalFilesDir(null) + "/tesseract/";boolean initResult = tessBaseAPI.init(tesseractDirPath, "chi_sim+eng");
if (!initResult) {Log.i(TAG, "初始化 Tesseract 失败");return;
}
- 调用 setImage 方法识别,调用 getUTF8Text 获取识别结果
Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.test_img);tessBaseAPI.setImage(bitmap);String result = tessBaseAPI.getUTF8Text();Log.i(TAG, "result: " + result);
三、补充情况
1、Bitmap 获取失败的情况
- 这里从一个不存在的资源文件获取 Bitmap
Bitmap bitmap = BitmapFactory.decodeResource(getResources(), 1001);Log.i(TAG, "bitmap: " + bitmap);tessBaseAPI.setImage(bitmap);String result = tessBaseAPI.getUTF8Text();Log.i(TAG, "result: " + result);
# 输出结果bitmap: null
...
FATAL EXCEPTION: main
Process: com.my.ocr_tesseract, PID: 25149
java.lang.RuntimeException: Unable to start activity ComponentInfo{com.my.ocr_tesseract/com.my.ocr_tesseract.MainActivity}: java.lang.RuntimeException: Failed to read bitmap
2、识别连笔字
- Tess-two 文字识别,识别连笔字的能力有限,推荐使用 ML Kit 数字墨水识别

# 输出结果result:

# 输出结果result: 锄
3、使用应用私有存储内部目录
- 也可以使用应用私有存储内部目录,这样也不需要请求权限
private void test() {copyTessDataToStorage("chi_sim.traineddata", "eng.traineddata");TessBaseAPI tessBaseAPI = new TessBaseAPI();String tesseractDirPath = getFilesDir() + "/tesseract/";boolean initResult = tessBaseAPI.init(tesseractDirPath, "chi_sim+eng");if (!initResult) {Log.i(TAG, "初始化 Tesseract 失败");return;}Bitmap bitmap = BitmapFactory.decodeResource(getResources(), R.drawable.test_img);Log.i(TAG, "bitmap: " + bitmap);tessBaseAPI.setImage(bitmap);String result = tessBaseAPI.getUTF8Text();Log.i(TAG, "result: " + result);
}public void copyTessDataToStorage(String... tessDataFiles) {String tessDataDirPath = getFilesDir() + "/tesseract/tessdata/";File tessDataDir = new File(tessDataDirPath);if (!tessDataDir.exists()) {tessDataDir.mkdirs();}AssetManager assetManager = getAssets();for (String fileName : tessDataFiles) {File outFile = new File(tessDataDirPath + fileName);if (outFile.exists()) continue;try (InputStream in = assetManager.open(fileName);OutputStream out = new FileOutputStream(outFile)) {byte[] buffer = new byte[1024];int read;while ((read = in.read(buffer)) != -1) {out.write(buffer, 0, read);}} catch (IOException e) {e.printStackTrace();}}
}