window10本地运行datax与datax-web
搭建 dataX
前置条件
- JDK(1.8以上,推荐1.8)
- Python(2或3都可以)
- Apache Maven 3.x (Compile DataX)
下载 datax 编译好的包
https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202309/datax.tar.gz
进入目录,使用 powershell 打开
执行解压命令
tar -xzf datax.tar.gz
得到
配置 windows 命令行支持 utf-8
不配置这个命令,执行同步命令会出现中文乱码
先把日志“乱码”变成可读中文,最简单两步即可:
- 临时改当前窗口编码
打开 cmd,先执行
chcp 65001
再运行
python datax.py C:\Users\longz\Downloads\1.json
日志就会以 UTF-8 输出,中文正常显示。
- 永久生效(可选)
在 Windows 10/11 上可以把系统全局控制台编码改成 UTF-8:- 设置 → 时间和语言 → 区域 → 相关设置 → 管理语言设置 → 更改系统区域设置 → 勾选 “Beta: 使用 Unicode UTF-8 提供全球语言支持” → 重启。
之后所有 cmd/PowerShell 窗口默认就是 UTF-8,不会再出现方块或乱码。
- 设置 → 时间和语言 → 区域 → 相关设置 → 管理语言设置 → 更改系统区域设置 → 勾选 “Beta: 使用 Unicode UTF-8 提供全球语言支持” → 重启。
或者直接在控制面板找
把如下的复选框勾上
重启
准备测试数据
新建两个库,相同的表名,user
CREATE TABLE IF NOT EXISTS `user` (`id` INT UNSIGNED AUTO_INCREMENT PRIMARY KEY,`username` VARCHAR(50) NOT NULL,`password` VARCHAR(100) NOT NULL,`email` VARCHAR(100) NOT NULL,`created_at` TIMESTAMP DEFAULT CURRENT_TIMESTAMP
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;
在 test1 库,新增测试数据。
我这里用存储过程,直接调用即可生成测试数据。
DELIMITER $$CREATE PROCEDURE GenerateUserTestData(IN num_records INT)
BEGINDECLARE i INT DEFAULT 1;DECLARE v_username VARCHAR(50);DECLARE v_password VARCHAR(100);DECLARE v_email VARCHAR(100);WHILE i <= num_records DO-- 生成随机用户名(如 User_12345)SET v_username = CONCAT('User_', FLOOR(10000 + RAND() * 90000));-- 生成随机密码(简化示例,实际应加密)SET v_password = CONCAT('Pass_', FLOOR(100000 + RAND() * 900000));-- 生成随机邮箱(如 user12345@example.com)SET v_email = CONCAT('user', FLOOR(10000 + RAND() * 90000), '@example.com');INSERT INTO `user` (username, password, email)VALUES (v_username, v_password, v_email);-- 每1000条提交一次,避免事务过大IF i % 1000 = 0 THENCOMMIT;END IF;SET i = i + 1;END WHILE;
END$$DELIMITER ;
生成完成后,目标,使用 datax 同步 test1 库的数据到 test2 库
新建 datax 的同步脚本
{"job": {"content": [{"reader": {"name": "mysqlreader","parameter": {"username": "root","password": "root","connection": [{"jdbcUrl": ["jdbc:mysql://localhost:3306/test1"],"table": ["user"]}],"column": ["*"],"splitPk": ""}},"writer": {"name": "mysqlwriter","parameter": {"username": "root","password": "root","connection": [{"jdbcUrl": "jdbc:mysql://localhost:3306/test2","table": ["user"]}],"column": ["*"],"writeMode": "insert"}}}],"setting": {"speed": {"channel": 1}}}
}
执行同步命令
python datax.py D:\Information_Technology\worksapce_tool\datax-new\datax\job\test-user.json
执行完成
D:\Information_Technology\worksapce_tool\datax-new\datax\bin>python datax.py D:\Information_Technology\worksapce_tool\datax-new\datax\job\test-user.jsonDataX (DATAX-OPENSOURCE-3.0), From Alibaba !
Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.2025-08-08 14:00:36.524 [main] INFO MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN
2025-08-08 14:00:36.526 [main] INFO MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]
2025-08-08 14:00:36.536 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl
2025-08-08 14:00:36.541 [main] INFO Engine - the machine info =>osInfo: Windows 10 amd64 10.0jvmInfo: Oracle Corporation 1.8 25.341-b10cpu num: 16totalPhysicalMemory: -0.00GfreePhysicalMemory: -0.00GmaxFileDescriptorCount: -1currentOpenFileDescriptorCount: -1GC Names [PS MarkSweep, PS Scavenge]MEMORY_NAME | allocation_size | init_sizePS Eden Space | 256.00MB | 256.00MBCode Cache | 240.00MB | 2.44MBCompressed Class Space | 1,024.00MB | 0.00MBPS Survivor Space | 42.50MB | 42.50MBPS Old Gen | 683.00MB | 683.00MBMetaspace | -0.00MB | 0.00MB2025-08-08 14:00:36.552 [main] INFO Engine -
{"content":[{"reader":{"name":"mysqlreader","parameter":{"username":"root","password":"****","connection":[{"jdbcUrl":["jdbc:mysql://localhost:3306/test1"],"table":["user"]}],"column":["*"],"splitPk":""}},"writer":{"name":"mysqlwriter","parameter":{"username":"root","password":"****","connection":[{"jdbcUrl":"jdbc:mysql://localhost:3306/test2","table":["user"]}],"column":["*"],"writeMode":"insert"}}}],"setting":{"speed":{"channel":1}}
}2025-08-08 14:00:36.565 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false
2025-08-08 14:00:36.566 [main] INFO JobContainer - DataX jobContainer starts job.
2025-08-08 14:00:36.566 [main] INFO JobContainer - Set jobId = 0
Fri Aug 08 14:00:36 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2025-08-08 14:00:42.909 [job-0] INFO OriginalConfPretreatmentUtil - Available jdbcUrl:jdbc:mysql://localhost:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true.
2025-08-08 14:00:42.910 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置存在一定的风险. 因为您未配置读取数据库表的列,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2025-08-08 14:00:43.091 [job-0] INFO OriginalConfPretreatmentUtil - table:[user] all columns:[
id,username,password,email,created_at
].
2025-08-08 14:00:43.091 [job-0] WARN OriginalConfPretreatmentUtil - 您的配置文件中的列配置信息存在风险. 因为您配置的写入数据库表的列为*,当您的表字段个数、类型有变动时,可能影响任务正确性甚至会运行出错。请检查您的配置并作出修改.
2025-08-08 14:00:43.092 [job-0] INFO OriginalConfPretreatmentUtil - Write data [
insert INTO %s (id,username,password,email,created_at) VALUES(?,?,?,?,?)
], which jdbcUrl like:[jdbc:mysql://localhost:3306/test2?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&rewriteBatchedStatements=true&tinyInt1isBit=false]
2025-08-08 14:00:43.093 [job-0] INFO JobContainer - jobContainer starts to do prepare ...
2025-08-08 14:00:43.093 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do prepare work .
2025-08-08 14:00:43.093 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do prepare work .
2025-08-08 14:00:43.094 [job-0] INFO JobContainer - jobContainer starts to do split ...
2025-08-08 14:00:43.094 [job-0] INFO JobContainer - Job set Channel-Number to 1 channels.
2025-08-08 14:00:43.097 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] splits to [1] tasks.
2025-08-08 14:00:43.097 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] splits to [1] tasks.
2025-08-08 14:00:43.115 [job-0] INFO JobContainer - jobContainer starts to do schedule ...
2025-08-08 14:00:43.117 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.
2025-08-08 14:00:43.119 [job-0] INFO JobContainer - Running by standalone Mode.
2025-08-08 14:00:43.124 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.
2025-08-08 14:00:43.126 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.
2025-08-08 14:00:43.127 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.
2025-08-08 14:00:43.136 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started
2025-08-08 14:00:43.140 [0-0-0-reader] INFO CommonRdbmsReader$Task - Begin to read record by Sql: [select * from user
] jdbcUrl:[jdbc:mysql://localhost:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
Fri Aug 08 14:00:43 GMT+08:00 2025 WARN: Establishing SSL connection without server's identity verification is not recommended. According to MySQL 5.5.45+, 5.6.26+ and 5.7.6+ requirements SSL connection must be established by default if explicit option isn't set. For compliance with existing applications not using SSL the verifyServerCertificate property is set to 'false'. You need either to explicitly disable SSL by setting useSSL=false, or set useSSL=true and provide truststore for server certificate verification.
2025-08-08 14:00:43.179 [0-0-0-reader] INFO CommonRdbmsReader$Task - Finished read record by Sql: [select * from user
] jdbcUrl:[jdbc:mysql://localhost:3306/test1?yearIsDateType=false&zeroDateTimeBehavior=convertToNull&tinyInt1isBit=false&rewriteBatchedStatements=true].
2025-08-08 14:00:43.456 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[321]ms
2025-08-08 14:00:43.456 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.
2025-08-08 14:00:53.142 [job-0] INFO StandAloneJobContainerCommunicator - Total 100 records, 5192 bytes | Speed 519B/s, 10 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2025-08-08 14:00:53.142 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.
2025-08-08 14:00:53.142 [job-0] INFO JobContainer - DataX Writer.Job [mysqlwriter] do post work.
2025-08-08 14:00:53.143 [job-0] INFO JobContainer - DataX Reader.Job [mysqlreader] do post work.
2025-08-08 14:00:53.143 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.
2025-08-08 14:00:53.144 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: D:\Information_Technology\worksapce_tool\datax-new\datax\hook
2025-08-08 14:00:53.144 [job-0] INFO JobContainer -[total cpu info] =>averageCpu | maxDeltaCpu | minDeltaCpu -1.00% | -1.00% | -1.00%[total gc info] =>NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTimePS MarkSweep | 1 | 1 | 1 | 0.016s | 0.016s | 0.016sPS Scavenge | 1 | 1 | 1 | 0.008s | 0.008s | 0.008s2025-08-08 14:00:53.144 [job-0] INFO JobContainer - PerfTrace not enable!
2025-08-08 14:00:53.145 [job-0] INFO StandAloneJobContainerCommunicator - Total 100 records, 5192 bytes | Speed 519B/s, 10 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.000s | All Task WaitReaderTime 0.000s | Percentage 100.00%
2025-08-08 14:00:53.146 [job-0] INFO JobContainer -
任务启动时刻 : 2025-08-08 14:00:36
任务结束时刻 : 2025-08-08 14:00:53
任务总计耗时 : 16s
任务平均流量 : 519B/s
记录写入速度 : 10rec/s
读出记录总数 : 100
读写失败总数 : 0
搭建 datax-web
下载源码
git clone https://github.com/WeiYe-Jing/datax-web.git
创建数据库
执行bin/db下面的datax_web.sql文件(注意老版本更新语句有指定库名)
修改项目配置
1.修改datax_admin下resources/application.yml文件
#数据源datasource:username: rootpassword: rooturl: jdbc:mysql://localhost:3306/datax_web?serverTimezone=Asia/Shanghai&useLegacyDatetimeCode=false&useSSL=false&nullNamePatternMatchesAll=true&useUnicode=true&characterEncoding=UTF-8driver-class-name: com.mysql.jdbc.Driver
修改数据源配置,目前仅支持mysql
# 配置mybatis-plus打印sql日志
logging:level:com.wugui.datax.admin.mapper: errorpath: ./data/applogs/admin
修改日志路径path
# datax-web emailmail:host: smtp.qq.comport: 25username: xxx@qq.compassword: xxxproperties:mail:smtp:auth: truestarttls:enable: truerequired: truesocketFactory:class: javax.net.ssl.SSLSocketFactory
修改邮件发送配置(不需要可以不修改)
2.修改datax_executor下resources/application.yml文件
# log config
logging:config: classpath:logback.xmlpath: ./data/applogs/executor/jobhandler
修改日志路径path
datax:job:admin:### datax-web admin addressaddresses: http://127.0.0.1:8080executor:appname: datax-executorip:port: 9999### job log pathlogpath: ./data/applogs/executor/jobhandler### job log retention dayslogretentiondays: 30executor:jsonpath: D:\\Information_Technology\\worksapce_tool\\tmp\\executor\\json\\pypath: D:\\Information_Technology\\worksapce_tool\\datax-new\\datax\\bin\\datax.py
修改datax.job配置
- admin.addresses datax_admin部署地址,如调度中心集群部署存在多个地址则用逗号分隔,执行器将会使用该地址进行"执行器心跳注册"和"任务结果回调";
- executor.appname 执行器AppName,每个执行器机器集群的唯一标示,执行器心跳注册分组依据;
- executor.ip 默认为空表示自动获取IP,多网卡时可手动设置指定IP,该IP不会绑定Host仅作为通讯实用;地址信息用于 “执行器注册” 和 “调度中心请求并触发任务”;
- executor.port 执行器Server端口号,默认端口为9999,单机部署多个执行器时,注意要配置不同执行器端口;
- executor.logpath 执行器运行日志文件存储磁盘路径,需要对该路径拥有读写权限;
- executor.logretentiondays 执行器日志文件保存天数,过期日志自动清理, 限制值大于等于3时生效; 否则, 如-1, 关闭自动清理功能;
- executor.jsonpath datax json临时文件保存路径
- pypath DataX启动脚本地址,例如:xxx/datax/bin/datax.py(这个路径是上面搭建 dataX 已经创建好的启动脚本)
如果系统配置DataX环境变量(DATAX_HOME),logpath、jsonpath、pypath可不配,log文件和临时json存放在环境变量路径下。
启动项目
本地idea开发环境
- 1.运行datax_admin下 DataXAdminApplication
- 2.运行datax_executor下 DataXExecutorApplication
admin启动成功后日志会输出三个地址,两个接口文档地址,一个前端页面地址
启动成功
启动成功后打开页面(默认管理员用户名:admin 密码:123456)
http://localhost:8080/index.html#/dashboard
实战
项目管理-添加项目
配置数据源
创建执行器
构建任务生成同步 json
点构建,会自动生成同步 datax 脚本,复制 json,回到菜单
创建任务
创建任务,按照如下填写,并把 json 贴过来,并把 byte 的值改为 0