当前位置: 首页 > news >正文

JoyAgent问数多表关联Bug修复

        我们测试了JoyAgent单表问数表现,整体问数效果高于预期,能够自动加工复杂的数据场景。例如:我们导入员工信息表,包含身份证号码和出生日期等字段,我们通过问数“获取身份证号码中的出生日期和登记的出生日期有差异的员工”,系统先从身份证号码中提取出生日期,并做日期格式转换,再判断是否相同,可以很好的反馈正确数据。

        但是,真实分析场景很少在单张表上做数据分析,会横跨多张表做数据关联。JoyAgent的源代码以及样例中并未展示或测试,需要多表关联的业务问数。我们在检查源代码后,评估JoyAgent可以完成多表关联任务,但是,源代码并无外键关系知识库编写规范和样例可供参考。于是,我们设计了一个多表关联的数据场景。并在此场景下,做了工程测试。

        我们在完成场景设计和测试数据导入后,系统自动生成了表关联SQL代码,但是在表名字替换阶段出现了问题。下面我们逐步描述我们设置的查询场景,以及查询场景之下遇到的一个系统bug,并成功修复,达到预期效果。我们逐步描述场景、问题、修复方案

1.数据场景

我们导入两张表作为问数基础:员工信息表,员工考勤信息表。建表语句如下:

CREATE TABLE employee_info (employee_id VARCHAR(20) PRIMARY KEY COMMENT '员工ID(主键)',full_name VARCHAR(50) NOT NULL COMMENT '员工全名',gender VARCHAR(50) COMMENT '性别:男或女',nationality VARCHAR(30) COMMENT '国籍',id_card VARCHAR(20)  COMMENT '身份证号',birth_date VARCHAR(200) NOT NULL COMMENT '出生日期',department VARCHAR(50) NOT NULL COMMENT '所属部门',marital_status VARCHAR(20)  COMMENT '婚姻状况(未婚,已婚,离异)',education VARCHAR(20) COMMENT '最高学历(高中,专科,本科,硕士,博士)',contact_phone VARCHAR(15) COMMENT '联系电话',emergency_contact VARCHAR(15) COMMENT '紧急联系人电话',address VARCHAR(1000) COMMENT '现居住地址',hire_date VARCHAR(100) NOT NULL COMMENT '入职日期'
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COMMENT='员工基础信息表';CREATE TABLE employee_attendance (attendance_id INT PRIMARY KEY COMMENT '行 ID',employee_id VARCHAR(20) COMMENT '员工ID',clock_in_time VARCHAR(20)  COMMENT '上班打卡时间',clock_out_time VARCHAR(20)  COMMENT '下班打卡时间',clock_in_type varchar(100) COMMENT '打卡方式(指纹,人脸识别, IC卡, 手机APP, 手动补录)'
) COMMENT='员工打卡信息表';

数据模型配置:

      - name: 员工基础信息表id: t_employeeinfotype: tablecontent: employee_inforemark: 员工基础信息表business-prompt:  ignore-fields:default-recall-fields:analyze-suggest-fields:analyze-forbid-fields: employee_idsync-value-fields:column-alias-map: ''- name: 员工考勤明细表id: t_employeeattendancetype: tablecontent: employee_attendanceremark: 包含员工的考勤打卡记录business-prompt:  ignore-fields:default-recall-fields:analyze-suggest-fields:analyze-forbid-fields: attendance_idsync-value-fields:column-alias-map: ''

测试数据(略),可以通过deepseek按照这种数据模式帮助造几条数据。

2.系统异常分析

        当我们问出”获取考勤打卡迟到员工的姓名”,可以正常生成SQL伪代码,SQL中的表名称是配置的知识库ID:

2025-10-17T03:25:44.730Z INFO 47 --- [genie-backend] [48.237:1601/...] com.jd.genie.service.Nl2SqlService : b44bf49f-e299-4c17-8e72-90b991325a6e,b44bf49f-e299-4c17-8e72-90b991325a6e SSE数据结果:{"code": 200, "data": [{"query": "通过员工信息表和考勤记录表,获取每个员工的迟到情况", "nl2sql": "SELECT t_employeeinfo.EMPLOYEE_IDt_employeeinfo.FULL_NAME, COUNT(CASE WHEN CAST(t_employeeattendance.CLOCK_IN_TIME AS TIMESTAMP) > CAST(DATE_FORMAT(CAST(t_employeeattendance.CLOCK_IN_TIME AS TIMESTAMP), 'yyyy-MM-dd') || ' 09:00:00' AS TIMESTAMP) THEN 1 END) AS late_count FROM t_employeeinfo JOIN t_employeeattendance ON t_employeeinfo.EMPLOYEE_ID = t_employeeattendance.EMPLOYEE_ID GROUP BY t_employeeinfo.EMPLOYEE_IDt_employeeinfo.FULL_NAME"}], "request_id": "b44bf49f-e299-4c17-8e72-90b991325a6e", "status": "data", "error_msg": ""}

完成表结构替换后,将所有源表都替换成同一个表,导致程序执行错误,返回无法回答问题:

2025-10-17T03:25:46.329Z INFO 47 --- [genie-backend] [ exe-pool-1] com.jd.genie.service.Nl2SqlService : b44bf49f-e299-4c17-8e72-90b991325a6e,b44bf49f-e299-4c17-8e72-90b991325a6e 执行sql:SELECT employee_info.EMPLOYEE_ID, employee_info.FULL_NAME, COUNT(CASE WHEN CAST(employee_info.CLOCK_IN_TIME AS TIMESTAMP) > CAST(DATE_FORMAT(CAST(employee_info.CLOCK_IN_TIME AS TIMESTAMP), 'yyyy-MM-dd') || ' 09:00:00' AS TIMESTAMP) THEN 1 END) AS late_count FROM employee_info JOIN employee_info ON employee_info.EMPLOYEE_ID = employee_info.EMPLOYEE_ID GROUP BY employee_info.EMPLOYEE_ID, employee_info.FULL_NAME

原因分析:com.jd.genie.service.Nl2SqlService,代码中获取的TableName没有更新,导致被替换成同一张表

修复后,在循环替换过程中,更新tableName

修复效果:

2025-10-17T08:31:30.005Z  INFO 46 --- [genie-backend] [48.237:1601/...] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 SSE数据结果:{"code": 200, "data": [{"query": "获取考勤打卡迟到员工的姓名和部门", "nl2sql": "SELECT DISTINCT `t_employeeinfo`.`FULL_NAME`, `t_employeeinfo`.`DEPARTMENT` FROM `t_employeeinfo` INNER JOIN `t_employeeattendance` ON `t_employeeinfo`.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')"}], "request_id": "86f76794-e83c-42b3-9958-53904409e6f0", "status": "data", "error_msg": ""}
2025-10-17T08:31:30.018Z  INFO 46 --- [genie-backend] [48.237:1601/...] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 SSE 连接关闭
2025-10-17T08:31:30.025Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0 sse event count:208
2025-10-17T08:31:30.090Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始替换modelname:共有model[5]个
2025-10-17T08:31:30.091Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_employeeinfo=employee_info],匹配前SQL:SELECT DISTINCT `t_employeeinfo`.`FULL_NAME`, `t_employeeinfo`.`DEPARTMENT` FROM `t_employeeinfo` INNER JOIN `t_employeeattendance` ON `t_employeeinfo`.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_employeeinfo=employee_info],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_salaryinfo=salary_info],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_salaryinfo=salary_info],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.092Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_uegarulwipfivhutcvyawaoex=(select  `order_date` as `采购日期`,product_id as `商品ID`,product_name as `商品名称`,category as `商品类型`,sub_category as `商品子类型` , quantity as `采购量` , sales as `采购总价格` from sales_data) t],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.093Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_uegarulwipfivhutcvyawaoex=(select  `order_date` as `采购日期`,product_id as `商品ID`,product_name as `商品名称`,category as `商品类型`,sub_category as `商品子类型` , quantity as `采购量` , sales as `采购总价格` from sales_data) t],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.111Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_employeeattendance=employee_attendance],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN `t_employeeattendance` ON employee_info.`EMPLOYEE_ID` = `t_employeeattendance`.`EMPLOYEE_ID` WHERE PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(`t_employeeattendance`.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.112Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_employeeattendance=employee_attendance],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 开始匹配model[t_qtpbgamccmrctthlurauclckq=sales_data],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : *** 完成匹配model[t_qtpbgamccmrctthlurauclckq=sales_data],匹配前SQL:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 执行sql:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] c.j.g.d.j.c.JdbcConnectionPools          : 从缓存获取连接池 poolId genie-datasource
2025-10-17T08:31:30.113Z  INFO 46 --- [genie-backend] [     exe-pool-2] c.j.g.d.provider.jdbc.JdbcDataProvider   : jdbc执行sql:SELECT DISTINCT employee_info.`FULL_NAME`, employee_info.`DEPARTMENT` FROM employee_info INNER JOIN employee_attendance ON employee_info.`EMPLOYEE_ID` = employee_attendance.`EMPLOYEE_ID` WHERE PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss') > PARSEDATETIME(CONCAT(FORMATDATETIME(PARSEDATETIME(employee_attendance.`CLOCK_IN_TIME`, 'yyyy-MM-dd HH:mm:ss'), 'yyyy-MM-dd'), ' 09:00:00'), 'yyyy-MM-dd HH:mm:ss')
2025-10-17T08:31:30.114Z  INFO 46 --- [genie-backend] [     exe-pool-2] c.j.g.d.j.connection.ConnectionWrapper   : 获取数据库链接成功 poolId:genie-datasource
2025-10-17T08:31:30.373Z  INFO 46 --- [genie-backend] [     exe-pool-2] com.jd.genie.service.Nl2SqlService       : 86f76794-e83c-42b3-9958-53904409e6f0,86f76794-e83c-42b3-9958-53904409e6f0 查询sql结果大小:5

http://www.dtcms.com/a/495359.html

相关文章:

  • 网站建设投标书报价表高端网站设计思路
  • 【玩泰山派】8、泰山派安装armbian,玩armbian
  • 雄安建设集团 网站推荐个在广州做网站的
  • vue3加载shp文件 并地图上展示
  • 实战:用PyTorch构建你的第一个图像分类CNN模型
  • 淄博网站建设优化公司wordpress后台登录网址
  • 每日一个网络知识点:网络层NAT
  • 不花钱网站怎么做推广小程序同步wordpress
  • 哈尔滨站建好了吗做网站机构图用什么工具
  • 基于ArcGIS的生态敏感性分析案例 | 绿水青山就是金山银山
  • adb root啥意思?adb remount啥意思?
  • PySide6 自定义文本查找对话框(QFindTextDialog)以及调用示例——重构版
  • TypeScript 面试题及详细答案 100题 (41-50)-- 函数类型
  • 静态网站建设要学什么做网站然后推广
  • 访问不了服务器的网站《水利建设与管理》杂志社网站
  • Vue3 创建项目指南
  • 迅为iTOP-Hi3516CV610开发板海思3516V610S应用安防监控AI智能视觉
  • 【软考备考】 数据模型:E-R模型、关系模型详解
  • 深入解析Kubernetes中的NetworkPolicy:构建零信任网络的安全基石
  • 遵义网站建设服务怎么建设淘宝联盟的网站
  • 创世网站建设wordpress图片显示缩略图
  • 11.Docker实战-部署 Ghost 开源内容管理系统
  • 【小白笔记】区分类方法/实例方法和静态函数/命名空间函数
  • Python 分类模型评估:从理论到实战(以信用卡欺诈检测为例)
  • 开源 C++ QT QML 开发(二十三)程序发布
  • 礼与仁:社会规范与内心情感的双人舞
  • 设计模式之:简单工厂模式
  • 哈尔滨网站建设哪儿好薇榆社网站建设
  • python的报错
  • 【数据结构】单链表“0”基础知识讲解 + 实战演练