当前位置: 首页 > news >正文

window10本地运行datax与datax-web

搭建 dataX

前置条件

  • JDK(1.8以上,推荐1.8)
  • Python(2或3都可以)
  • Apache Maven 3.x (Compile DataX)

下载 datax 编译好的包

https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202309/datax.tar.gz

进入目录,使用 powershell 打开

执行解压命令

tar -xzf datax.tar.gz

得到

配置 windows 命令行支持 utf-8

不配置这个命令,执行同步命令会出现中文乱码

先把日志“乱码”变成可读中文,最简单两步即可:

  1. 临时改当前窗口编码
    打开 cmd,先执行
chcp 65001

再运行

python datax.py C:\Users\longz\Downloads\1.json

日志就会以 UTF-8 输出,中文正常显示。

  1. 永久生效(可选)
    在 Windows 10/11 上可以把系统全局控制台编码改成 UTF-8:
    • 设置 → 时间和语言 → 区域 → 相关设置管理语言设置更改系统区域设置 → 勾选 “Beta: 使用 Unicode UTF-8 提供全球语言支持” → 重启。
      之后所有 cmd/PowerShell 窗口默认就是 UTF-8,不会再出现方块或乱码。

或者直接在控制面板找

把如下的复选框勾上

重启

准备测试数据

新建两个库,相同的表名,user

CREATE TABLE IF NOT EXISTS `user` (`id` INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,`username` VARCHAR(50) NOT NULL,`password` VARCHAR(100) NOT NULL,`email` VARCHAR(100) NOT NULL,`created_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

在 test1 库,新增测试数据。

我这里用存储过程,直接调用即可生成测试数据。

DELIMITER $$CREATE PROCEDURE GenerateUserTestData(IN num_records INT)
BEGINDECLARE i INT DEFAULT 1;DECLARE v_username VARCHAR(50);DECLARE v_password VARCHAR(100);DECLARE v_email VARCHAR(100);WHILE i <= num_records DO-- 生成随机用户名(如 User_12345)SET v_username = CONCAT('User_', FLOOR(10000 + RAND() * 90000));-- 生成随机密码(简化示例,实际应加密)SET v_password = CONCAT('Pass_', FLOOR(100000 + RAND() * 900000));-- 生成随机邮箱(如 user12345@example.com)SET v_email = CONCAT('user', FLOOR(10000 + RAND() * 90000), '@example.com');INSERT INTO `user` (username, password, email)VALUES (v_username, v_password, v_email);-- 每1000条提交一次,避免事务过大IF i % 1000 = 0 THENCOMMIT;END IF;SET i = i + 1;END WHILE;
END$$DELIMITER ;

生成完成后,目标,使用 datax 同步 test1 库的数据到 test2 库

新建 datax 的同步脚本

{"job": {"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "root","connection": [{"jdbcUrl": ["jdbc:mysql://localhost:3306/test1"],"table": ["user"]}],"column": ["*"],"splitPk": ""}},"writer": {"name": "mysqlwriter","parameter": {"username": "root","password": "root","connection": [{"jdbcUrl": "jdbc:mysql://localhost:3306/test2","table": ["user"]}],"column": ["*"],"writeMode": "insert"}}}],"setting": {"speed": {"channel": 1}}}
}

执行同步命令

python datax.py D:\Information_Technology\worksapce_tool\datax-new\datax\job\test-user.json

执行完成

D:\Information_Technology\worksapce_tool\datax-new\datax\bin>python datax.py D:\Information_Technology\worksapce_tool\datax-new\datax\job\test-user.jsonDataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.2025-08-08 14:00:36.524 [main] INFO  MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN
2025-08-08 14:00:36.526 [main] INFO  MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]
2025-08-08 14:00:36.536 [main] INFO  VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2025-08-08 14:00:36.541 [main] INFO  Engine - the machine info  =>osInfo: Windows 10 amd64 10.0jvmInfo:        Oracle Corporation 1.8 25.341-b10cpu num:        16totalPhysicalMemory:    -0.00GfreePhysicalMemory:     -0.00GmaxFileDescriptorCount: -1currentOpenFileDescriptorCount: -1GC Names        [PS MarkSweep, PS Scavenge]MEMORY_NAME                    | allocation_size                | init_sizePS Eden Space                  | 256.00MB                       | 256.00MBCode Cache                     | 240.00MB                       | 2.44MBCompressed Class Space         | 1,024.00MB                     | 0.00MBPS Survivor Space              | 42.50MB                        | 42.50MBPS Old Gen                     | 683.00MB                       | 683.00MBMetaspace                      | -0.00MB                        | 0.00MB2025-08-08 14:00:36.552 [main] INFO  Engine -
{"content":[{"reader":{"name":"mysqlreader","parameter":{"username":"root","password":"****","connection":[{"jdbcUrl":["jdbc:mysql://localhost:3306/test1"],"table":["user"]}],"column":["*"],"splitPk":""}},"writer":{"name":"mysqlwriter","parameter":{"username":"root","password":"****","connection":[{"jdbcUrl":"jdbc:mysql://localhost:3306/test2","table":["user"]}],"column":["*"],"writeMode":"insert"}}}],"setting":{"speed":{"channel":1}}
}2025-08-08 14:00:36.565 [main] INFO  PerfTrace - PerfTrace traceId=job_-1, isEnable=false
2025-08-08 14:00:36.566 [main] INFO  JobContainer - DataX jobContainer starts job.
2025-08-08 14:00:36.566 [main] INFO  JobContainer - Set jobId = 0
Fri Aug 08 14:00:36 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2025-08-08 14:00:42.909 [job-0] INFO  OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://localhost:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2025-08-08 14:00:42.910 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2025-08-08 14:00:43.091 [job-0] INFO  OriginalConfPretreatmentUtil - table:[user] all columns:[
id,username,password,email,created_at
].
2025-08-08 14:00:43.091 [job-0] WARN  OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2025-08-08 14:00:43.092 [job-0] INFO  OriginalConfPretreatmentUtil - Write data [
insert INTO %s (id,username,password,email,created_at) VALUES(?,?,?,?,?)
], which jdbcUrl like:[jdbc:mysql://localhost:3306/test2?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false]
2025-08-08 14:00:43.093 [job-0] INFO  JobContainer - jobContainer starts to do prepare ...
2025-08-08 14:00:43.093 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
2025-08-08 14:00:43.093 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2025-08-08 14:00:43.094 [job-0] INFO  JobContainer - jobContainer starts to do split ...
2025-08-08 14:00:43.094 [job-0] INFO  JobContainer - Job set Channel-Number to 1 channels.
2025-08-08 14:00:43.097 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2025-08-08 14:00:43.097 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2025-08-08 14:00:43.115 [job-0] INFO  JobContainer - jobContainer starts to do schedule ...
2025-08-08 14:00:43.117 [job-0] INFO  JobContainer - Scheduler starts [1] taskGroups.
2025-08-08 14:00:43.119 [job-0] INFO  JobContainer - Running by standalone Mode.
2025-08-08 14:00:43.124 [taskGroup-0] INFO  TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2025-08-08 14:00:43.126 [taskGroup-0] INFO  Channel - Channel set byte_speed_limit to -1, No bps activated.
2025-08-08 14:00:43.127 [taskGroup-0] INFO  Channel - Channel set record_speed_limit to -1, No tps activated.
2025-08-08 14:00:43.136 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2025-08-08 14:00:43.140 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Begin to read record by Sql: [select * from user
] jdbcUrl:[jdbc:mysql://localhost:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2025-08-08 14:00:43.179 [0-0-0-reader] INFO  CommonRdbmsReader$Task - Finished read record by Sql: [select * from user
] jdbcUrl:[jdbc:mysql://localhost:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2025-08-08 14:00:43.456 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[321]ms
2025-08-08 14:00:43.456 [taskGroup-0] INFO  TaskGroupContainer - taskGroup[0] completed it's tasks.
2025-08-08 14:00:53.142 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100 records, 5192 bytes | Speed 519B/s, 10 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2025-08-08 14:00:53.142 [job-0] INFO  AbstractScheduler - Scheduler accomplished all tasks.
2025-08-08 14:00:53.142 [job-0] INFO  JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2025-08-08 14:00:53.143 [job-0] INFO  JobContainer - DataX Reader.Job [mysqlreader] do post work.
2025-08-08 14:00:53.143 [job-0] INFO  JobContainer - DataX jobId [0] completed successfully.
2025-08-08 14:00:53.144 [job-0] INFO  HookInvoker - No hook invoked, because base dir not exists or is a file: D:\Information_Technology\worksapce_tool\datax-new\datax\hook
2025-08-08 14:00:53.144 [job-0] INFO  JobContainer -[total cpu info] =>averageCpu                     | maxDeltaCpu                    | minDeltaCpu                 -1.00%                         | -1.00%                         | -1.00%[total gc info] =>NAME                 | totalGCCount       | maxDeltaGCCount    | minDeltaGCCount    | totalGCTime        | maxDeltaGCTime     | minDeltaGCTimePS MarkSweep         | 1                  | 1                  | 1                  | 0.016s             | 0.016s             | 0.016sPS Scavenge          | 1                  | 1                  | 1                  | 0.008s             | 0.008s             | 0.008s2025-08-08 14:00:53.144 [job-0] INFO  JobContainer - PerfTrace not enable!
2025-08-08 14:00:53.145 [job-0] INFO  StandAloneJobContainerCommunicator - Total 100 records, 5192 bytes | Speed 519B/s, 10 records/s | Error 0 records, 0 bytes |  All Task WaitWriterTime 0.000s |  All Task WaitReaderTime 0.000s | Percentage 100.00%
2025-08-08 14:00:53.146 [job-0] INFO  JobContainer -
任务启动时刻                    : 2025-08-08 14:00:36
任务结束时刻                    : 2025-08-08 14:00:53
任务总计耗时                    :                 16s
任务平均流量                    :              519B/s
记录写入速度                    :             10rec/s
读出记录总数                    :                 100
读写失败总数                    :                   0

搭建 datax-web

下载源码

git clone https://github.com/WeiYe-Jing/datax-web.git

创建数据库

执行bin/db下面的datax_web.sql文件(注意老版本更新语句有指定库名)

修改项目配置

1.修改datax_admin下resources/application.yml文件

#数据源datasource:username: rootpassword: rooturl: jdbc:mysql://localhost:3306/datax_web?serverTimezone=Asia/Shanghai&useLegacyDatetimeCode=false&useSSL=false&nullNamePatternMatchesAll=true&useUnicode=true&characterEncoding=UTF-8driver-class-name: com.mysql.jdbc.Driver

修改数据源配置,目前仅支持mysql

# 配置mybatis-plus打印sql日志
logging:level:com.wugui.datax.admin.mapper: errorpath: ./data/applogs/admin

修改日志路径path

  # datax-web emailmail:host: smtp.qq.comport: 25username: xxx@qq.compassword: xxxproperties:mail:smtp:auth: truestarttls:enable: truerequired: truesocketFactory:class: javax.net.ssl.SSLSocketFactory

修改邮件发送配置(不需要可以不修改)

2.修改datax_executor下resources/application.yml文件

# log config
logging:config: classpath:logback.xmlpath: ./data/applogs/executor/jobhandler

修改日志路径path

datax:job:admin:### datax-web admin addressaddresses: http://127.0.0.1:8080executor:appname: datax-executorip:port: 9999### job log pathlogpath: ./data/applogs/executor/jobhandler### job log retention dayslogretentiondays: 30executor:jsonpath: D:\\Information_Technology\\worksapce_tool\\tmp\\executor\\json\\pypath: D:\\Information_Technology\\worksapce_tool\\datax-new\\datax\\bin\\datax.py

修改datax.job配置

  • admin.addresses datax_admin部署地址,如调度中心集群部署存在多个地址则用逗号分隔,执行器将会使用该地址进行"执行器心跳注册"和"任务结果回调";
  • executor.appname 执行器AppName,每个执行器机器集群的唯一标示,执行器心跳注册分组依据;
  • executor.ip 默认为空表示自动获取IP,多网卡时可手动设置指定IP,该IP不会绑定Host仅作为通讯实用;地址信息用于 “执行器注册” 和 “调度中心请求并触发任务”;
  • executor.port 执行器Server端口号,默认端口为9999,单机部署多个执行器时,注意要配置不同执行器端口;
  • executor.logpath 执行器运行日志文件存储磁盘路径,需要对该路径拥有读写权限;
  • executor.logretentiondays 执行器日志文件保存天数,过期日志自动清理, 限制值大于等于3时生效; 否则, 如-1, 关闭自动清理功能;
  • executor.jsonpath datax json临时文件保存路径
  • pypath DataX启动脚本地址,例如:xxx/datax/bin/datax.py(这个路径是上面搭建 dataX 已经创建好的启动脚本)
    如果系统配置DataX环境变量(DATAX_HOME),logpath、jsonpath、pypath可不配,log文件和临时json存放在环境变量路径下。

启动项目

本地idea开发环境
  • 1.运行datax_admin下 DataXAdminApplication
  • 2.运行datax_executor下 DataXExecutorApplication
    在这里插入图片描述

admin启动成功后日志会输出三个地址,两个接口文档地址,一个前端页面地址

启动成功

启动成功后打开页面(默认管理员用户名:admin 密码:123456)
http://localhost:8080/index.html#/dashboard
在这里插入图片描述

实战

项目管理-添加项目

配置数据源

创建执行器

构建任务生成同步 json

点构建,会自动生成同步 datax 脚本,复制 json,回到菜单

创建任务

创建任务,按照如下填写,并把 json 贴过来,并把 byte 的值改为 0

执行成功

http://www.dtcms.com/a/322125.html

相关文章:

  • 吴恩达 深度学习笔记
  • 校招秋招春招小米在线测评小米测评题库|测评解析和攻略|题库分享
  • 阶段二测试
  • 巧妙实现Ethercat转Profinet协议网关匹配光伏电站
  • 《P6464 [传智杯 #2 决赛] 传送门》
  • 五、CV_ResNet
  • Redis的Linux安装
  • python-操作mysql数据库(增删改查)
  • 医疗设备专用电源滤波器的安全设计与应用价值|深圳维爱普
  • Python接口测试实战之搭建自动化测试框架
  • 文学主题的演变
  • 智慧养老场景识别率↑91%!陌讯轻量化模型在独居监护的落地优化
  • 根据ASTM D4169-23e1标准,如何选择合适的流通周期进行测试?
  • 大语言模型的过去与未来——GPT-5发布小谈
  • 机器学习概念1
  • 数据库基础--多表关系,多表查询
  • 中小型企业ERP实施成本高?析客ERP系统独立部署+模块化配置的务实解决方案
  • ENET_GetRxFrame vs ENET_ReadFrame
  • MySQL 正则表达式详细说明
  • AI摄像头动捕:让动作分析更自由、更智能、更高效
  • 刚刚,GPT-5 炸裂登场!可免费使用
  • Word中怎样插入特殊符号
  • seo-使用nuxt定义页面标题和meta等信息
  • 11_Mybatis 是如何进行DO类和数据库字段的映射的?
  • HTTP/HTTPS代理,支持RSA和SM2算法
  • 消防通道占用识别误报率↓79%:陌讯动态区域感知算法实战解析
  • 自签名证书实现HTTPS协议
  • 17.14 CogVLM-17B多模态模型爆肝部署:4-bit量化+1120px高清输入,A100实战避坑指南
  • 登上Nature子刊,深度学习正逐渐接管基础模型
  • NY128NY133美光固态闪存NY139NY143