当前位置：首页 > news >正文

C++ 判断文件的编码类型

news 来源：原创 2025/5/31 20:46:32

大多数文本编辑器，都会在文本文件的头部插入一部分特殊的字节，用于辅助文本编辑器来判断该文件的字符集编码类型。

如：记事本

目前支持的字符集类型，通常为三种：

Unicode、UTF8、UnicodeBIG、CP_ACP（默认编码，简体中文一般为：GBK）

        int File::GetEncoding(const void* p, int length, int& offset) noexcept {offset = 0;if (NULL == p || length < 3) {return ppp::text::Encoding::ASCII;}// byte[] Unicode = new byte[] { 0xFF, 0xFE, 0x41 };// byte[] UnicodeBIG = new byte[] { 0xFE, 0xFF, 0x00 };// byte[] UTF8 = new byte[] { 0xEF, 0xBB, 0xBF }; // BOMconst Byte* s = (Byte*)p;if (s[0] == 0xEF && s[1] == 0xBB && s[2] == 0xBF) {offset += 3;return ppp::text::Encoding::UTF8;}elif(s[0] == 0xFE && s[1] == 0xFF && s[2] == 0x00) {offset += 3;return ppp::text::Encoding::BigEndianUnicode;}elif(s[0] == 0xFF && s[1] == 0xFE && s[2] == 0x41) {offset += 3;return ppp::text::Encoding::Unicode;}else {return ppp::text::Encoding::ASCII;}}