当前位置：首页 > news >正文

记录一次使用datax一次性导入多张表的经验

news 2025/8/14 8:56:21

一直以来，我都在使用DataX进行表数据迁移，体验非常不错。然而，今天研发团队提供了大量需要迁移的表，如果继续使用DataX的JSON配置文件逐个导入，效率会非常低。为了提高效率，我决定编写一个脚本，实现批量导入功能，并立即着手开始开发。

一、编写json文件

#编写json文件的模板
[worker@cs-nll sync_data]$ vim template.json 
                                                                                                             
{                                                                                                                 
  "job": 
    {
      "setting": {
        "speed": {
          "channel": 5
        }
      },
      "content": [
        {
          "reader": {
            "name": "mysqlreader",
            "parameter": {
                "username": "{{src_username}}",
                "password": "{{src_password}}",
                "column": ["*"],
                "connection": [
                   {
                           "jdbcUrl": ["jdbc:mysql://{{src_host}}:{{src_port}}/{{src_db}}"],
                           "table": ["{{src_table}}"]
                   }
                ],
                "where": "{{where}}"
            }
          },
          "writer": {
            "name": "mysqlwriter",
            "parameter": {
                "batchSize":"2048",
                "batchByteSize":"33554432",
                "username": "{{dest_username}}",
                "password": "{{dest_password}}",
                "column": ["*"],
                "writeMode": "insert",
                "connection": [
                   {
                           "jdbcUrl": "jdbc:mysql://{{dest_host}}:{{dest_port}}/{{dest_db}}",
                           "table": ["{{dest_table}}"]
                   }
                ]
            }
          }
        }
      ]
    }
}

二、编写shell脚本

[worker@cs-nll sync_data]$ cat sync_data.sh 
#!/bin/bash
                                                                                                             
set -e
#源数据库信息
#src_jdbcUrl="jdbc:mysql://172.18.45.28:53306/cqshzl"
src_host="172.18.45.28"
src_port="53306"
src_db="cqshzl"
src_username="root"
src_password="LKfAtBW\&Oq0Y^H%M"
#目标数据库信息
#dest_jdbcUrl="jdbc:mysql://172.18.45.28:53306/cqshzl"
dest_host="172.18.45.28"
dest_port="53306"
dest_db="cqshzl"
dest_username="root"
dest_password="LKfAtBW\&Oq0Y^H%M"
#TABLES=("user" "order" "product" "log")
#条件
year="2025"
where="YEAR(follow_end_time) ='$year'"
# 待迁移表列表（格式：源表名:目标表名）
tables=(
        "test:test$year"
        "personnel_manage_follow:personnel_manage_follow_$year"
      )
                                                                                                             
# 遍历表名，生成配置文件
for table_pair in "${tables[@]}"; do
  # 分割源表名和目标表名
  IFS=':' read -ra TABLE <<< "$table_pair"
  src_table="${TABLE[0]}"
  dest_table="${TABLE[1]}"
  # 替换模板中的占位符
  sed  -e "s/{{src_table}}/$src_table/g"  \
       -e "s/{{src_username}}/$src_username/g" \
       -e "s/{{src_password}}/$src_password/g" \
       -e "s/{{src_host}}/$src_host/g" \
       -e "s/{{src_port}}/$src_port/g" \
       -e "s/{{src_db}}/$src_db/g" \
       -e "s/{{dest_table}}/$dest_table/g" \
       -e "s/{{dest_username}}/$dest_username/g" \
       -e "s/{{dest_password}}/$dest_password/g" \
       -e "s/{{dest_host}}/$dest_host/g" \
       -e "s/{{dest_port}}/$dest_port/g" \
       -e "s/{{dest_db}}/$dest_db/g" \
       -e "s/{{where}}/$where/g"  \
       template.json > "job_${dest_table}.json"
                                                                                                             
  # 执行 DataX 任务
    echo "正在迁移表: $src_table => $dest_table ..."
  python /data/software/datax/bin/datax.py "job_${dest_table}.json" > "${dest_table}.log" 2>&1 &
  # 检查执行状态
  if [ $? -eq 0 ]; then
    echo "脚本执行成功，请查看日志${dest_table}.log"
  else
    echo "脚本执行失败，请查看日志${dest_table}.log"
  fi
done
                                                                                                             
echo "所有表同步任务已启动，请查看日志"

#执行脚本
[worker@cs-nll sync_data]$ chmod +x sync_data.sh 
[worker@cs-nll sync_data]$ ./sync_data.sh

至此，完美解决了 datax 批量同步表功能(灵活控制表的个数)

查看全文

http://www.dtcms.com/a/110923.html