当前位置：首页 > news >正文

Java按字节长度截取字符串指南

news 2025/10/31 17:24:07

在Java中，由于字符串可能包含多字节字符(如中文)，直接按字节长度截取可能会导致乱码或截取不准确的问题。以下是几种按字节长度截取字符串的方法：

方法一：使用String的getBytes方法
java
public static String substringByBytes(String str, int byteLength) {
if (str == null || str.isEmpty() || byteLength <= 0) {
return "";
}

byte[] bytes = str.getBytes();
if (byteLength >= bytes.length) {
return str;
}

// 处理截取位置可能是多字节字符的情况
int len = 0;
for (int i = 0; i < str.length(); i++) {
char c = str.charAt(i);
len += (c <= 255) ? 1 : 2; // 假设非ASCII字符占2字节

if (len > byteLength) {
return str.substring(0, i);
} else if (len == byteLength) {
return str.substring(0, i + 1);
}
}
return str;
}
方法二：指定字符编码处理
java
public static String substringByBytes(String str, int byteLength, String charsetName)
throws UnsupportedEncodingException {
if (str == null || str.isEmpty() || byteLength <= 0) {
return "";
}

byte[] bytes = str.getBytes(charsetName);
if (byteLength >= bytes.length) {
return str;
}

// 根据编码创建新的字符串
return new String(bytes, 0, byteLength, charsetName);
}
方法三：更精确的字符编码处理
java
public static String substringByBytes(String str, int maxBytes, String charsetName)
throws UnsupportedEncodingException {
if (str == null || charsetName == null || charsetName.isEmpty()) {
return str;
}

byte[] bytes = str.getBytes(charsetName);
if (bytes.length <= maxBytes) {
return str;
}

// 处理截断可能导致的半个字符问题
int nBytes = 0;
int i = 0;
for (; i < str.length(); i++) {
char c = str.charAt(i);
int charBytes = String.valueOf(c).getBytes(charsetName).length;
if (nBytes + charBytes > maxBytes) {
break;
}
nBytes += charBytes;
}

return str.substring(0, i);
}
使用示例
java
public static void main(String[] args) {
String testStr = "你好，Java世界！Hello World!";

try {
System.out.println(substringByBytes(testStr, 10)); // 输出：你好，J
System.out.println(substringByBytes(testStr, 15, "UTF-8")); // 输出：你好，Java
System.out.println(substringByBytes(testStr, 20, "GBK")); // 输出：你好，Java世界！
} catch (UnsupportedEncodingException e) {
e.printStackTrace();
}
}
注意事项
不同编码下字符占用的字节数不同：