当前位置: 首页 > news >正文

第3章 乱码的前世今生-字符集和比较规则

文章目录

  • MySQL中支持的字符集和排序规则
    • MySQL中的utf8和utf8mb4
    • 比较规则的查看
  • 各级别的字符集和比较规则
    • 服务器级别
    • 数据库级别
    • 表级别
    • 列级别
    • 仅修改字符集或仅修改比较规则
  • 各级别字符集和比较规则小结
  • Ubuntu 操作系统三个环境变量的值决定了当前使用的字符集
  • C/S字符集不同则鸡同鸭讲
  • 服务器处理请求
  • 比较规则的应用
  • 总结
    • 从发送请求到接收结果中发生的字符集转换
    • 在这个过程中各个系统指标的含义如下:

MySQL中支持的字符集和排序规则

MySQL中的utf8和utf8mb4

mysql> SHOW CHARSET;
+----------+---------------------------------+---------------------+--------+
| Charset  | Description                     | Default collation   | Maxlen |
+----------+---------------------------------+---------------------+--------+
| armscii8 | ARMSCII-8 Armenian              | armscii8_general_ci |      1 |
| ascii    | US ASCII                        | ascii_general_ci    |      1 |
| big5     | Big5 Traditional Chinese        | big5_chinese_ci     |      2 |
| binary   | Binary pseudo charset           | binary              |      1 |
| cp1250   | Windows Central European        | cp1250_general_ci   |      1 |
| cp1251   | Windows Cyrillic                | cp1251_general_ci   |      1 |
| cp1256   | Windows Arabic                  | cp1256_general_ci   |      1 |
| cp1257   | Windows Baltic                  | cp1257_general_ci   |      1 |
| cp850    | DOS West European               | cp850_general_ci    |      1 |
| cp852    | DOS Central European            | cp852_general_ci    |      1 |
| cp866    | DOS Russian                     | cp866_general_ci    |      1 |
| cp932    | SJIS for Windows Japanese       | cp932_japanese_ci   |      2 |
| dec8     | DEC West European               | dec8_swedish_ci     |      1 |
| eucjpms  | UJIS for Windows Japanese       | eucjpms_japanese_ci |      3 |
| euckr    | EUC-KR Korean                   | euckr_korean_ci     |      2 |
| gb18030  | China National Standard GB18030 | gb18030_chinese_ci  |      4 |
| gb2312   | GB2312 Simplified Chinese       | gb2312_chinese_ci   |      2 |
| gbk      | GBK Simplified Chinese          | gbk_chinese_ci      |      2 |
| geostd8  | GEOSTD8 Georgian                | geostd8_general_ci  |      1 |
| greek    | ISO 8859-7 Greek                | greek_general_ci    |      1 |
| hebrew   | ISO 8859-8 Hebrew               | hebrew_general_ci   |      1 |
| hp8      | HP West European                | hp8_english_ci      |      1 |
| keybcs2  | DOS Kamenicky Czech-Slovak      | keybcs2_general_ci  |      1 |
| koi8r    | KOI8-R Relcom Russian           | koi8r_general_ci    |      1 |
| koi8u    | KOI8-U Ukrainian                | koi8u_general_ci    |      1 |
| latin1   | cp1252 West European            | latin1_swedish_ci   |      1 |
| latin2   | ISO 8859-2 Central European     | latin2_general_ci   |      1 |
| latin5   | ISO 8859-9 Turkish              | latin5_turkish_ci   |      1 |
| latin7   | ISO 8859-13 Baltic              | latin7_general_ci   |      1 |
| macce    | Mac Central European            | macce_general_ci    |      1 |
| macroman | Mac West European               | macroman_general_ci |      1 |
| sjis     | Shift-JIS Japanese              | sjis_japanese_ci    |      2 |
| swe7     | 7bit Swedish                    | swe7_swedish_ci     |      1 |
| tis620   | TIS620 Thai                     | tis620_thai_ci      |      1 |
| ucs2     | UCS-2 Unicode                   | ucs2_general_ci     |      2 |
| ujis     | EUC-JP Japanese                 | ujis_japanese_ci    |      3 |
| utf16    | UTF-16 Unicode                  | utf16_general_ci    |      4 |
| utf16le  | UTF-16LE Unicode                | utf16le_general_ci  |      4 |
| utf32    | UTF-32 Unicode                  | utf32_general_ci    |      4 |
| utf8mb3  | UTF-8 Unicode                   | utf8mb3_general_ci  |      3 |
| utf8mb4  | UTF-8 Unicode                   | utf8mb4_0900_ai_ci  |      4 |
+----------+---------------------------------+---------------------+--------+
41 rows in set (0.05 sec)

比较规则的查看

mysql> SHOW COLLATION LIKE 'utf8_%';
+-----------------------------+---------+-----+---------+----------+---------+---------------+
| Collation                   | Charset | Id  | Default | Compiled | Sortlen | Pad_attribute |
+-----------------------------+---------+-----+---------+----------+---------+---------------+
| utf8mb3_bin                 | utf8mb3 |  83 |         | Yes      |       1 | PAD SPACE     |
| utf8mb3_croatian_ci         | utf8mb3 | 213 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_czech_ci            | utf8mb3 | 202 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_danish_ci           | utf8mb3 | 203 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_esperanto_ci        | utf8mb3 | 209 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_estonian_ci         | utf8mb3 | 198 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_general_ci          | utf8mb3 |  33 | Yes     | Yes      |       1 | PAD SPACE     |
| utf8mb3_general_mysql500_ci | utf8mb3 | 223 |         | Yes      |       1 | PAD SPACE     |
| utf8mb3_german2_ci          | utf8mb3 | 212 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_hungarian_ci        | utf8mb3 | 210 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_icelandic_ci        | utf8mb3 | 193 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_latvian_ci          | utf8mb3 | 194 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_lithuanian_ci       | utf8mb3 | 204 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_persian_ci          | utf8mb3 | 208 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_polish_ci           | utf8mb3 | 197 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_romanian_ci         | utf8mb3 | 195 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_roman_ci            | utf8mb3 | 207 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_sinhala_ci          | utf8mb3 | 211 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_slovak_ci           | utf8mb3 | 205 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_slovenian_ci        | utf8mb3 | 196 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_spanish2_ci         | utf8mb3 | 206 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_spanish_ci          | utf8mb3 | 199 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_swedish_ci          | utf8mb3 | 200 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_tolower_ci          | utf8mb3 |  76 |         | Yes      |       1 | PAD SPACE     |
| utf8mb3_turkish_ci          | utf8mb3 | 201 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_unicode_520_ci      | utf8mb3 | 214 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_unicode_ci          | utf8mb3 | 192 |         | Yes      |       8 | PAD SPACE     |
| utf8mb3_vietnamese_ci       | utf8mb3 | 215 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_0900_ai_ci          | utf8mb4 | 255 | Yes     | Yes      |       0 | NO PAD        |
| utf8mb4_0900_as_ci          | utf8mb4 | 305 |         | Yes      |       0 | NO PAD        |
| utf8mb4_0900_as_cs          | utf8mb4 | 278 |         | Yes      |       0 | NO PAD        |
| utf8mb4_0900_bin            | utf8mb4 | 309 |         | Yes      |       1 | NO PAD        |
| utf8mb4_bg_0900_ai_ci       | utf8mb4 | 318 |         | Yes      |       0 | NO PAD        |
| utf8mb4_bg_0900_as_cs       | utf8mb4 | 319 |         | Yes      |       0 | NO PAD        |
| utf8mb4_bin                 | utf8mb4 |  46 |         | Yes      |       1 | PAD SPACE     |
| utf8mb4_bs_0900_ai_ci       | utf8mb4 | 316 |         | Yes      |       0 | NO PAD        |
| utf8mb4_bs_0900_as_cs       | utf8mb4 | 317 |         | Yes      |       0 | NO PAD        |
| utf8mb4_croatian_ci         | utf8mb4 | 245 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_cs_0900_ai_ci       | utf8mb4 | 266 |         | Yes      |       0 | NO PAD        |
| utf8mb4_cs_0900_as_cs       | utf8mb4 | 289 |         | Yes      |       0 | NO PAD        |
| utf8mb4_czech_ci            | utf8mb4 | 234 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_danish_ci           | utf8mb4 | 235 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_da_0900_ai_ci       | utf8mb4 | 267 |         | Yes      |       0 | NO PAD        |
| utf8mb4_da_0900_as_cs       | utf8mb4 | 290 |         | Yes      |       0 | NO PAD        |
| utf8mb4_de_pb_0900_ai_ci    | utf8mb4 | 256 |         | Yes      |       0 | NO PAD        |
| utf8mb4_de_pb_0900_as_cs    | utf8mb4 | 279 |         | Yes      |       0 | NO PAD        |
| utf8mb4_eo_0900_ai_ci       | utf8mb4 | 273 |         | Yes      |       0 | NO PAD        |
| utf8mb4_eo_0900_as_cs       | utf8mb4 | 296 |         | Yes      |       0 | NO PAD        |
| utf8mb4_esperanto_ci        | utf8mb4 | 241 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_estonian_ci         | utf8mb4 | 230 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_es_0900_ai_ci       | utf8mb4 | 263 |         | Yes      |       0 | NO PAD        |
| utf8mb4_es_0900_as_cs       | utf8mb4 | 286 |         | Yes      |       0 | NO PAD        |
| utf8mb4_es_trad_0900_ai_ci  | utf8mb4 | 270 |         | Yes      |       0 | NO PAD        |
| utf8mb4_es_trad_0900_as_cs  | utf8mb4 | 293 |         | Yes      |       0 | NO PAD        |
| utf8mb4_et_0900_ai_ci       | utf8mb4 | 262 |         | Yes      |       0 | NO PAD        |
| utf8mb4_et_0900_as_cs       | utf8mb4 | 285 |         | Yes      |       0 | NO PAD        |
| utf8mb4_general_ci          | utf8mb4 |  45 |         | Yes      |       1 | PAD SPACE     |
| utf8mb4_german2_ci          | utf8mb4 | 244 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_gl_0900_ai_ci       | utf8mb4 | 320 |         | Yes      |       0 | NO PAD        |
| utf8mb4_gl_0900_as_cs       | utf8mb4 | 321 |         | Yes      |       0 | NO PAD        |
| utf8mb4_hr_0900_ai_ci       | utf8mb4 | 275 |         | Yes      |       0 | NO PAD        |
| utf8mb4_hr_0900_as_cs       | utf8mb4 | 298 |         | Yes      |       0 | NO PAD        |
| utf8mb4_hungarian_ci        | utf8mb4 | 242 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_hu_0900_ai_ci       | utf8mb4 | 274 |         | Yes      |       0 | NO PAD        |
| utf8mb4_hu_0900_as_cs       | utf8mb4 | 297 |         | Yes      |       0 | NO PAD        |
| utf8mb4_icelandic_ci        | utf8mb4 | 225 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_is_0900_ai_ci       | utf8mb4 | 257 |         | Yes      |       0 | NO PAD        |
| utf8mb4_is_0900_as_cs       | utf8mb4 | 280 |         | Yes      |       0 | NO PAD        |
| utf8mb4_ja_0900_as_cs       | utf8mb4 | 303 |         | Yes      |       0 | NO PAD        |
| utf8mb4_ja_0900_as_cs_ks    | utf8mb4 | 304 |         | Yes      |      24 | NO PAD        |
| utf8mb4_latvian_ci          | utf8mb4 | 226 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_la_0900_ai_ci       | utf8mb4 | 271 |         | Yes      |       0 | NO PAD        |
| utf8mb4_la_0900_as_cs       | utf8mb4 | 294 |         | Yes      |       0 | NO PAD        |
| utf8mb4_lithuanian_ci       | utf8mb4 | 236 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_lt_0900_ai_ci       | utf8mb4 | 268 |         | Yes      |       0 | NO PAD        |
| utf8mb4_lt_0900_as_cs       | utf8mb4 | 291 |         | Yes      |       0 | NO PAD        |
| utf8mb4_lv_0900_ai_ci       | utf8mb4 | 258 |         | Yes      |       0 | NO PAD        |
| utf8mb4_lv_0900_as_cs       | utf8mb4 | 281 |         | Yes      |       0 | NO PAD        |
| utf8mb4_mn_cyrl_0900_ai_ci  | utf8mb4 | 322 |         | Yes      |       0 | NO PAD        |
| utf8mb4_mn_cyrl_0900_as_cs  | utf8mb4 | 323 |         | Yes      |       0 | NO PAD        |
| utf8mb4_nb_0900_ai_ci       | utf8mb4 | 310 |         | Yes      |       0 | NO PAD        |
| utf8mb4_nb_0900_as_cs       | utf8mb4 | 311 |         | Yes      |       0 | NO PAD        |
| utf8mb4_nn_0900_ai_ci       | utf8mb4 | 312 |         | Yes      |       0 | NO PAD        |
| utf8mb4_nn_0900_as_cs       | utf8mb4 | 313 |         | Yes      |       0 | NO PAD        |
| utf8mb4_persian_ci          | utf8mb4 | 240 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_pl_0900_ai_ci       | utf8mb4 | 261 |         | Yes      |       0 | NO PAD        |
| utf8mb4_pl_0900_as_cs       | utf8mb4 | 284 |         | Yes      |       0 | NO PAD        |
| utf8mb4_polish_ci           | utf8mb4 | 229 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_romanian_ci         | utf8mb4 | 227 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_roman_ci            | utf8mb4 | 239 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_ro_0900_ai_ci       | utf8mb4 | 259 |         | Yes      |       0 | NO PAD        |
| utf8mb4_ro_0900_as_cs       | utf8mb4 | 282 |         | Yes      |       0 | NO PAD        |
| utf8mb4_ru_0900_ai_ci       | utf8mb4 | 306 |         | Yes      |       0 | NO PAD        |
| utf8mb4_ru_0900_as_cs       | utf8mb4 | 307 |         | Yes      |       0 | NO PAD        |
| utf8mb4_sinhala_ci          | utf8mb4 | 243 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_sk_0900_ai_ci       | utf8mb4 | 269 |         | Yes      |       0 | NO PAD        |
| utf8mb4_sk_0900_as_cs       | utf8mb4 | 292 |         | Yes      |       0 | NO PAD        |
| utf8mb4_slovak_ci           | utf8mb4 | 237 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_slovenian_ci        | utf8mb4 | 228 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_sl_0900_ai_ci       | utf8mb4 | 260 |         | Yes      |       0 | NO PAD        |
| utf8mb4_sl_0900_as_cs       | utf8mb4 | 283 |         | Yes      |       0 | NO PAD        |
| utf8mb4_spanish2_ci         | utf8mb4 | 238 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_spanish_ci          | utf8mb4 | 231 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_sr_latn_0900_ai_ci  | utf8mb4 | 314 |         | Yes      |       0 | NO PAD        |
| utf8mb4_sr_latn_0900_as_cs  | utf8mb4 | 315 |         | Yes      |       0 | NO PAD        |
| utf8mb4_sv_0900_ai_ci       | utf8mb4 | 264 |         | Yes      |       0 | NO PAD        |
| utf8mb4_sv_0900_as_cs       | utf8mb4 | 287 |         | Yes      |       0 | NO PAD        |
| utf8mb4_swedish_ci          | utf8mb4 | 232 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_tr_0900_ai_ci       | utf8mb4 | 265 |         | Yes      |       0 | NO PAD        |
| utf8mb4_tr_0900_as_cs       | utf8mb4 | 288 |         | Yes      |       0 | NO PAD        |
| utf8mb4_turkish_ci          | utf8mb4 | 233 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_unicode_520_ci      | utf8mb4 | 246 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_unicode_ci          | utf8mb4 | 224 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_vietnamese_ci       | utf8mb4 | 247 |         | Yes      |       8 | PAD SPACE     |
| utf8mb4_vi_0900_ai_ci       | utf8mb4 | 277 |         | Yes      |       0 | NO PAD        |
| utf8mb4_vi_0900_as_cs       | utf8mb4 | 300 |         | Yes      |       0 | NO PAD        |
| utf8mb4_zh_0900_as_cs       | utf8mb4 | 308 |         | Yes      |       0 | NO PAD        |
+-----------------------------+---------+-----+---------+----------+---------+---------------+
117 rows in set (0.00 sec)

在这里插入图片描述

各级别的字符集和比较规则

服务器级别

在这里插入图片描述

mysql> SHOW VARIABLES LIKE 'character_set_server';
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| character_set_server | utf8mb4 |
+----------------------+---------+
1 row in set (0.00 sec)
mysql> SHOW VARIABLES LIKE 'collation_server';
+------------------+--------------------+
| Variable_name    | Value              |
+------------------+--------------------+
| collation_server | utf8mb4_0900_ai_ci |
+------------------+--------------------+
1 row in set (0.00 sec)

数据库级别

mysql> CREATE DATABASE charset_demo_db-> CHARACTER SET gb2312-> COLLATE gb2312_chinese_ci;
Query OK, 1 row affected (0.06 sec)mysql> USE charset_demo_db;
Database changed
mysql> SHOW VARIABLES LIKE 'character_set_database';
+------------------------+--------+
| Variable_name          | Value  |
+------------------------+--------+
| character_set_database | gb2312 |
+------------------------+--------+
1 row in set (0.01 sec)mysql> SHOW VARIABLES LIKE 'collation_database';
+--------------------+-------------------+
| Variable_name      | Value             |
+--------------------+-------------------+
| collation_database | gb2312_chinese_ci |
+--------------------+-------------------+
1 row in set (0.00 sec)

表级别

mysql> CREATE TABLE t(-> col VARCHAR(10)-> ) CHARACTER SET utf8 COLLATE utf8_general_ci;
Query OK, 0 rows affected, 2 warnings (0.15 sec)

列级别

mysql> ALTER TABLE t MODIFY col VARCHAR(10) CHARACTER SET gbk COLLATE gbk_chinese_ci;
Query OK, 0 rows affected (0.13 sec)
Records: 0  Duplicates: 0  Warnings: 0

仅修改字符集或仅修改比较规则

mysql> SET character_set_server = gb2312;
Query OK, 0 rows affected (0.00 sec)mysql> SHOW VARIABLES LIKE 'character_set_server';
+----------------------+--------+
| Variable_name        | Value  |
+----------------------+--------+
| character_set_server | gb2312 |
+----------------------+--------+
1 row in set (0.00 sec)mysql> SHOW VARIABLES LIKE 'collation_server';
+------------------+-------------------+
| Variable_name    | Value             |
+------------------+-------------------+
| collation_server | gb2312_chinese_ci |
+------------------+-------------------+
1 row in set (0.01 sec)
mysql> SET collation_server=utf8_general_ci;
Query OK, 0 rows affected, 1 warning (0.00 sec)mysql> SHOW VARIABLES LIKE 'character_set_server';
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| character_set_server | utf8mb3 |
+----------------------+---------+
1 row in set (0.00 sec)mysql> SHOW VARIABLES LIKE 'collation_server';
+------------------+--------------------+
| Variable_name    | Value              |
+------------------+--------------------+
| collation_server | utf8mb3_general_ci |
+------------------+--------------------+
1 row in set (0.00 sec)

各级别字符集和比较规则小结

知道了这些规则之后,对于给定的表,我们应该知道它的各个列的字符集和比较规则是什么,从而根据这个列的类型来确定存储数据时每个列的实际数据占用的存储空间大小了。比方说我们向表t中插入一条记录:

向名为 t 的表中插入一条数据,具体来说,它会向表 t 中的 col 列插入一个值 ‘哈哈’

mysql> SHOW TABLES;
+---------------------------+
| Tables_in_charset_demo_db |
+---------------------------+
| t                         |
+---------------------------+
1 row in set (0.01 sec)mysql> INSERT INTO t(col) VALUES('哈哈');
Query OK, 1 row affected (0.02 sec)mysql> SELECT * FROM t;
+--------+
| col    |
+--------+
| 哈哈   |
+--------+
1 row in set (0.00 sec)

Ubuntu 操作系统三个环境变量的值决定了当前使用的字符集

在这里插入图片描述

C/S字符集不同则鸡同鸭讲

mysql> SHOW VARIABLES LIKE 'character_set_server';
+----------------------+---------+
| Variable_name        | Value   |
+----------------------+---------+
| character_set_server | utf8mb3 |
+----------------------+---------+
1 row in set (0.01 sec)mysql> SET character_set_client = ascii;
Query OK, 0 rows affected (0.01 sec)mysql> SELECT '我';
+-----+
| ??? |
+-----+
| ??? |
+-----+
1 row in set, 1 warning (0.00 sec)mysql> SHOW WARNINGS\G
*************************** 1. row ***************************Level: WarningCode: 1300
Message: Cannot convert string '\xE6\x88\x91' from ascii to utf8mb4
1 row in set (0.00 sec)

SELECT ‘我’; 是一条不从任何表取数据的查询,它只是返回一个常量字符串。这类写法常用来测试连接/字符集、构造常量列,或与其他表达式一起返回

服务器处理请求

在客户端向服务器发送请求时,请求的内容实际上是以字节流的形式传输的。服务器在接收到这些字节流时,会首先根据 character_set_client 设置来确定客户端发送过来的字节流所使用的字符集(即编码格式)。然后,服务器会将这些字节流从 character_set_client 字符集转换为服务器内部使用的 character_set_connection 字符集进行处理和存储。

character_set_client:这个字符集设置定义了客户端发送请求时,数据是以什么字符集(编码格式)进行编码的。比如,如果客户端是以 UTF-8 编码发送的请求数据,那么这个设置就会是 UTF-8。

character_set_connection:这是服务器在接收到客户端数据后,用来将字节流转换成字符的字符集。它通常在服务器端处理数据时使用,比如查询和插入数据时,服务器会使用这个字符集来处理收到的字节流。

在这里插入图片描述

mysql> SET character_set_connection=gbk;
Query OK, 0 rows affected (0.00 sec)mysql> SET collation_connection=gbk_chinese_ci;
Query OK, 0 rows affected (0.00 sec)mysql> SELECT 'a' = 'A';
+-----------+
| 'a' = 'A' |
+-----------+
|         1 |
+-----------+
1 row in set (0.01 sec)

SET collation_connection=gbk_bin;
这条语句设置了连接的排序规则为 gbk_bin,这是一种二进制排序规则。二进制排序规则的特点是区分大小写,即字符的比较是逐个字节进行的,两个字母 ‘a’ 和 ‘A’ 会被视为不同的字符,因为它们在字节上有不同的值。

mysql> SET character_set_connection=gbk;
Query OK, 0 rows affected (0.00 sec)mysql> SET collation_connection=gbk_bin;
Query OK, 0 rows affected (0.00 sec)mysql> SELECT 'a' = 'A';
+-----------+
| 'a' = 'A' |
+-----------+
|         0 |
+-----------+
1 row in set (0.00 sec)
mysql> SHOW databases;
+--------------------+
| Database           |
+--------------------+
| charset_demo_db    |
| demo               |
| information_schema |
| mysql              |
| performance_schema |
| sys                |
+--------------------+
6 rows in set (0.00 sec)mysql> use charset_demo_db;
Database changed
mysql> SHOW TABLES;
+---------------------------+
| Tables_in_charset_demo_db |
+---------------------------+
| t                         |
+---------------------------+
1 row in set (0.00 sec)mysql> CREATE TABLE tt(-> c VARCHAR(100)-> ) ENGINE=INNODB CHARSET=utf8;
Query OK, 0 rows affected, 1 warning (0.30 sec)
mysql> INSERT INTO tt (c) VALUES('我');
Query OK, 1 row affected (0.01 sec)mysql> SELECT* FROM tt;
+------+
| c    |
+------+
||
+------+
1 row in set (0.00 sec)

小贴士:如果某个列使用的字符集和character_set_connection代表的字符集不一致的话,还需要进行一次字符集转换。

在这里插入图片描述
在这里插入图片描述

比较规则的应用

比较规则的作用通常表示比较字符串大小的表达式以及对某个字符串结束排序中,所以有时也称为排序规则。比方说表t的列col使用的字符集是gbk,使用的比较规则是gbk_chinese_ci

mysql> SHOW CREATE TABLE t;
+-------+--------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                         |
+-------+--------------------------------------------------------------------------------------------------------------------------------------+
| t     | CREATE TABLE `t` (`col` varchar(10) CHARACTER SET gbk COLLATE gbk_chinese_ci DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=utf8mb3 |
+-------+--------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)mysql> INSERT INTO t(col) VALUES('a'), ('b'), ('A'), ('B');
Query OK, 4 rows affected (0.00 sec)
Records: 4  Duplicates: 0  Warnings: 0mysql> SELECT * FROM t ORDER BY col;
+--------+
| col    |
+--------+
| a      |
| A      |
| b      |
| B      |
| 哈哈   |
+--------+
5 rows in set (0.00 sec)

可以看到在默认的比较规则gbk_chinese_ci中是不区分大小写的,我们现在把列col的比较规则修改为gbk_bin,由于gbk_bin是直接比较字符的编码,所以是区分大小写的:

mysql> ALTER TABLE t MODIFY col VARCHAR(10) COLLATE gbk_bin;
Query OK, 5 rows affected (0.07 sec)
Records: 5  Duplicates: 0  Warnings: 0mysql> SELECT * FROM t ORDER BY col;
+--------+
| col    |
+--------+
| A      |
| B      |
| a      |
| b      |
| 哈哈   |
+--------+
5 rows in set (0.00 sec)

总结

从发送请求到接收结果中发生的字符集转换

在这里插入图片描述

在这个过程中各个系统指标的含义如下:

在这里插入图片描述

之后我会持续更新,如果喜欢我的文章,请记得一键三连哦,点赞关注收藏,你的每一个赞每一份关注每一次收藏都将是我前进路上的无限动力 !!!↖(▔▽▔)↗感谢支持!

http://www.dtcms.com/a/360402.html

相关文章:

  • 部署在windows的docker中的dify知识库存储位置
  • 常见线程池的创建方式及应用场景
  • Cookie、Session 和 JWT
  • 【K8s-Day 22】深入解析 Kubernetes Deployment:现代应用部署的基石与滚动更新的艺术
  • 服装管理软件与工厂计件系统精选
  • 【OpenGL】LearnOpenGL学习笔记18 - Uniform缓冲对象UBO
  • [每周一更]-(第158期):构建高性能数据库:MySQL 与 PostgreSQL 系统化问题管理与优化指南
  • XPlayer播放器APP:安卓平台上的全能视频播放器
  • 网络代理协议深度对比
  • Linux/UNIX系统编程手册笔记:系统和进程信息、文件I/O缓冲、系统编程概念以及文件属性
  • Multi-Head RAG: Solving Multi-Aspect Problems with LLMs
  • ST-2110概述
  • MySQL专题Day(1)————事务
  • postman 用于接口测试,举例
  • Linux shell 脚本基础 003
  • centos7安装jdk17
  • c++程序员日常超实用工具(长期记录更新)
  • PS自由变换
  • 【人工智能99问】LLaMA中的RoPE是什么?(35/99)
  • 学习Python中Selenium模块的基本用法(12:操作Cookie)
  • 【系统分析师】高分论文:论大数据架构的应用
  • 写一个 RTX 5080 上的 cuda gemm fp16
  • 使用yt-dlp下载网页视频
  • synchronized的锁对象 和 wait,notify的调用者之间的关系
  • Wi-Fi技术——初识
  • Flink NettyBufferPool
  • Docker中使用Compose配置现有网络
  • C语言————深入理解指针1(通俗易懂)
  • Linux 网络编程:深入理解套接字与通信机制
  • 【MySQL自学】SQL语法全解(上篇)