JoyAgent问数多表关联Bug修复
我们测试了JoyAgent单表问数表现,整体问数效果高于预期,能够自动加工复杂的数据场景。例如:我们导入员工信息表,包含身份证号码和出生日期等字段,我们通过问数“获取身份证号码中的出生日期和登记的出生日期有差异的员工”,系统先从身份证号码中提取出生日期,并做日期格式转换,再判断是否相同,可以很好的反馈正确数据。
但是,真实分析场景很少在单张表上做数据分析,会横跨多张表做数据关联。JoyAgent的源代码以及样例中并未展示或测试,需要多表关联的业务问数。我们在检查源代码后,评估JoyAgent可以完成多表关联任务,但是,源代码并无外键关系知识库编写规范和样例可供参考。于是,我们设计了一个多表关联的数据场景。并在此场景下,做了工程测试。
我们在完成场景设计和测试数据导入后,系统自动生成了表关联SQL代码,但是在表名字替换阶段出现了问题。下面我们逐步描述我们设置的查询场景,以及查询场景之下遇到的一个系统bug,并成功修复,达到预期效果。我们逐步描述场景、问题、修复方案
1.数据场景
我们导入两张表作为问数基础:员工信息表,员工考勤信息表。建表语句如下:
CREATE TABLE employee_info (employee_id VARCHAR(20) PRIMARY KEY COMMENT '员工ID(主键)',full_name VARCHAR(50) NOT NULL COMMENT '员工全名',gender VARCHAR(50) COMMENT '性别:男或女',nationality VARCHAR(30) COMMENT '国籍',id_card VARCHAR(20) COMMENT '身份证号',birth_date VARCHAR(200) NOT NULL COMMENT '出生日期',department VARCHAR(50) NOT NULL COMMENT '所属部门',marital_status VARCHAR(20) COMMENT '婚姻状况(未婚,已婚,离异)',education VARCHAR(20) COMMENT '最高学历(高中,专科,本科,硕士,博士)',contact_phone VARCHAR(15) COMMENT '联系电话',emergency_contact VARCHAR(15) COMMENT '紧急联系人电话',address VARCHAR(1000) COMMENT '现居住地址',hire_date VARCHAR(100) NOT NULL COMMENT '入职日期'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='员工基础信息表';CREATE TABLE employee_attendance (attendance_id INT PRIMARY KEY COMMENT '行 ID',employee_id VARCHAR(20) COMMENT '员工ID',clock_in_time VARCHAR(20) COMMENT '上班打卡时间',clock_out_time VARCHAR(20) COMMENT '下班打卡时间',clock_in_type varchar(100) COMMENT '打卡方式(指纹,人脸识别, IC卡, 手机APP, 手动补录)'
) COMMENT='员工打卡信息表';
数据模型配置:
- name: 员工基础信息表id: t_employeeinfotype: tablecontent: employee_inforemark: 员工基础信息表business-prompt: ignore-fields:default-recall-fields:analyze-suggest-fields:analyze-forbid-fields: employee_idsync-value-fields:column-alias-map: ''- name: 员工考勤明细表id: t_employeeattendancetype: tablecontent: employee_attendanceremark: 包含员工的考勤打卡记录business-prompt: ignore-fields:default-recall-fields:analyze-suggest-fields:analyze-forbid-fields: attendance_idsync-value-fields:column-alias-map: ''
测试数据(略),可以通过deepseek按照这种数据模式帮助造几条数据。
2.系统异常分析
当我们问出”获取考勤打卡迟到员工的姓名”,可以正常生成SQL伪代码,SQL中的表名称是配置的知识库ID:
2025-10-17T03:25:44.730Z INFO 47 --- [genie-backend] [48.237:1601/...] com.jd.genie.service.Nl2SqlService : b44bf49f-e299-4c17-8e72-90b991325a6e,b44bf49f-e299-4c17-8e72-90b991325a6e SSE数据结果:{"code": 200, "data": [{"query": "通过员工信息表和考勤记录表,获取每个员工的迟到情况", "nl2sql": "SELECT t_employeeinfo
.EMPLOYEE_ID
, t_employeeinfo
.FULL_NAME
, COUNT(CASE WHEN CAST(t_employeeattendance
.CLOCK_IN_TIME
AS TIMESTAMP) > CAST(DATE_FORMAT(CAST(t_employeeattendance
.CLOCK_IN_TIME
AS TIMESTAMP), 'yyyy-MM-dd') || ' 09:00:00' AS TIMESTAMP) THEN 1 END) AS late_count
FROM t_employeeinfo
JOIN t_employeeattendance
ON t_employeeinfo
.EMPLOYEE_ID
= t_employeeattendance
.EMPLOYEE_ID
GROUP BY t_employeeinfo
.EMPLOYEE_ID
, t_employeeinfo
.FULL_NAME
"}], "request_id": "b44bf49f-e299-4c17-8e72-90b991325a6e", "status": "data", "error_msg": ""}
完成表结构替换后,将所有源表都替换成同一个表,导致程序执行错误,返回无法回答问题:
2025-10-17T03:25:46.329Z INFO 47 --- [genie-backend] [ exe-pool-1] com.jd.genie.service.Nl2SqlService : b44bf49f-e299-4c17-8e72-90b991325a6e,b44bf49f-e299-4c17-8e72-90b991325a6e 执行sql:SELECT employee_info.EMPLOYEE_ID
, employee_info.FULL_NAME
, COUNT(CASE WHEN CAST(employee_info.CLOCK_IN_TIME
AS TIMESTAMP) > CAST(DATE_FORMAT(CAST(employee_info.CLOCK_IN_TIME
AS TIMESTAMP), 'yyyy-MM-dd') || ' 09:00:00' AS TIMESTAMP) THEN 1 END) AS late_count
FROM employee_info JOIN employee_info ON employee_info.EMPLOYEE_ID
= employee_info.EMPLOYEE_ID
GROUP BY employee_info.EMPLOYEE_ID
, employee_info.FULL_NAME
原因分析:com.jd.genie.service.Nl2SqlService,代码中获取的TableName没有更新,导致被替换成同一张表
修复后,在循环替换过程中,更新tableName
修复效果:
2025-10-17T08:31:30.005Z INFO 46 --- [genie-backend] [48.237:1601/...] com.jd.genie.service.Nl2SqlService : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 SSE数据结果:{"code": 200, "data": [{"query": "获取考勤打卡迟到员工的姓名和部门", "nl2sql": "SELECT DISTINCT `t_employeeinfo`.`FULL_NAME`, `t_employeeinfo`.`DEPARTMENT` FROM `t_employeeinfo` INNER JOIN `t_employeeattendance` ON `t_employeeinfo`.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')"}], "request_id": "86f76794-e83c-42b3-9958-53904409e6f0", "status": "data", "error_msg": ""}
2025-10-17T08:31:30.018Z INFO 46 --- [genie-backend] [48.237:1601/...] com.jd.genie.service.Nl2SqlService : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 SSE 连接关闭
2025-10-17T08:31:30.025Z INFO 46 --- [genie-backend] [ exe-pool-2] com.jd.genie.service.Nl2SqlService : 86f76794-e83c-42b3-9958-53904409e6f0 sse event count:208
2025-10-17T08:31:30.090Z INFO 46 --- [genie-backend] [ exe-pool-2] com.jd.genie.service.Nl2SqlService : *** 开始替换modelname:共有model[5]个
2025-10-17T08:31:30.091Z INFO 46 --- [genie-backend] [ exe-pool-2] com.jd.genie.service.Nl2SqlService : *** 开始匹配model[t_employeeinfo=employee_info],匹配前SQL:SELECT DISTINCT `t_employeeinfo`.`FULL_NAME`, `t_employeeinfo`.`DEPARTMENT` FROM `t_employeeinfo` INNER JOIN `t_employeeattendance` ON `t_employeeinfo`.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z INFO 46 --- [genie-backend] [ exe-pool-2] com.jd.genie.service.Nl2SqlService : *** 完成匹配model[t_employeeinfo=employee_info],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z INFO 46 --- [genie-backend] [ exe-pool-2] com.jd.genie.service.Nl2SqlService : *** 开始匹配model[t_salaryinfo=salary_info],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z INFO 46 --- [genie-backend] [ exe-pool-2] com.jd.genie.service.Nl2SqlService : *** 完成匹配model[t_salaryinfo=salary_info],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z INFO 46 --- [genie-backend] [ exe-pool-2] com.jd.genie.service.Nl2SqlService : *** 开始匹配model[t_uegarulwipfivhutcvyawaoex=(select `order_date` as `采购日期`,product_id as `商品ID`,product_name as `商品名称`,category as `商品类型`,sub_category as `商品子类型` , quantity as `采购量` , sales as `采购总价格` from sales_data) t],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.093Z INFO 46 --- [genie-backend] [ exe-pool-2] com.jd.genie.service.Nl2SqlService : *** 完成匹配model[t_uegarulwipfivhutcvyawaoex=(select `order_date` as `采购日期`,product_id as `商品ID`,product_name as `商品名称`,category as `商品类型`,sub_category as `商品子类型` , quantity as `采购量` , sales as `采购总价格` from sales_data) t],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.111Z INFO 46 --- [genie-backend] [ exe-pool-2] com.jd.genie.service.Nl2SqlService : *** 开始匹配model[t_employeeattendance=employee_attendance],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.112Z INFO 46 --- [genie-backend] [ exe-pool-2] com.jd.genie.service.Nl2SqlService : *** 完成匹配model[t_employeeattendance=employee_attendance],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z INFO 46 --- [genie-backend] [ exe-pool-2] com.jd.genie.service.Nl2SqlService : *** 开始匹配model[t_qtpbgamccmrctthlurauclckq=sales_data],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z INFO 46 --- [genie-backend] [ exe-pool-2] com.jd.genie.service.Nl2SqlService : *** 完成匹配model[t_qtpbgamccmrctthlurauclckq=sales_data],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z INFO 46 --- [genie-backend] [ exe-pool-2] com.jd.genie.service.Nl2SqlService : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 执行sql:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z INFO 46 --- [genie-backend] [ exe-pool-2] c.j.g.d.j.c.JdbcConnectionPools : 从缓存获取连接池 poolId genie-datasource
2025-10-17T08:31:30.113Z INFO 46 --- [genie-backend] [ exe-pool-2] c.j.g.d.provider.jdbc.JdbcDataProvider : jdbc执行sql:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.114Z INFO 46 --- [genie-backend] [ exe-pool-2] c.j.g.d.j.connection.ConnectionWrapper : 获取数据库链接成功 poolId:genie-datasource
2025-10-17T08:31:30.373Z INFO 46 --- [genie-backend] [ exe-pool-2] com.jd.genie.service.Nl2SqlService : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 查询sql结果大小:5