当前位置: 首页 > news >正文

AI调研 | Omnisql模型家族调研与实测

一. 一句话总结

中国人民大学团队开源OmniSQL模型家族及SynSQL-2.5M数据集,以裸用模型为手段,在BIRD-SQL榜单上,Omnisql-14B超过了Deepseek-v3和claude3.7的成绩。

modelBIRD-Dev
Omnisql-14B57.17
Claude3.7 No Extended Thinking55.08
Deepseek-v354.82

链接:https://github.com/RUCKBReasoning/OmniSQL
arXiv:为https://arxiv.org/abs/2503.02240

二.概述

中国人民大学团队推出的OmniSQL及SynSQL-2.5M,阐述其功能特点、技术原理、应用场景、使用方法,强调该成果在文本转SQL领域的重要意义和发展潜力。

三.详情

  1. OmniSQL概述:OmniSQL是人大团队开源的文本转SQL模型家族,有7B、14B、32B三种参数规模,适配不同算力需求。其性能超越GPT-4o等闭源模型,背后依托全球首个百万级文本转SQL数据集SynSQL-2.5M,该数据集含250万高质量样本,覆盖众多领域数据库。

  2. 功能特点

    • SynSQL-2.5M数据集优势:样本规模大且多样,涵盖超16,000个领域数据库,支持9种自然语言风格;SQL查询精细分级为简单、中等、复杂、高度复杂四级;每个样本都有CoT解决方案,提高模型可解释性。
    • OmniSQL模型亮点:多尺寸灵活适配不同资源条件,支持本地化部署;在9个权威基准测试中成绩优异;用单一模型实现高精度文本转SQL,无需外部修正模块。
  3. 技术原理(四步数据合成框架)

    • 数据库自动生成:分析网络表格构建数据库结构,并采用增强策略优化。
    • 复杂度可控的SQL生成:定义四个复杂度等级,结合SQLite函数库生成各类SQL查询。
    • 风格化问题反向翻译:将SQL查询反向译为9种语言风格的自然语言问题,确保语义一致。
    • CoT解决方案合成:为样本添加中间推导步骤,提高推理准确性和可解释性。
  4. 应用场景:在企业数据分析中降低BI工具使用门槛;教育领域帮助初学者理解SQL转换过程;能快速生成领域专用数据集推动跨领域数据应用创新;支持本地化部署保障金融、政务等行业数据安全。

  5. 快速使用指南:提供提示模板,使用时替换“db_details”和“question”,目前仅支持SQLite;介绍了与vLLM结合的推理步骤,包括导入相关库、格式化提示、加载模型等操作。

  6. 结语与项目地址:OmniSQL是重大突破,未来有望扩展应用场景。

附件

SynSQL-2.5M数据样例

Task Overview:
You are a data science expert. Below, you are provided with a database schema and a natural language question. Your task is to understand the schema and generate a valid SQL query to answer the question.

Database Engine:
SQLite

Database Schema:
CREATE TABLE cards (
    id integer, -- unique id number identifying the cards, example: [41138, 1349]
    artist text, -- example: ['Pete Venters', 'Volkan Baǵa']
    asciiName text, -- example: ['El-Hajjaj', 'Junun Efreet']
    availability text, -- example: ['mtgo,paper', 'paper']
    borderColor text, -- example: ['black', 'white']
    cardKingdomFoilId text, -- example: ['123094', '123095']
    cardKingdomId text, -- example: ['122719', '122720']
    colorIdentity text, -- example: ['W', 'B']
    colorIndicator text, -- example: ['U', 'G']
    colors text, -- example: ['W', 'B']
    convertedManaCost real, -- example: [7.0, 5.0]
    duelDeck text, -- example: ['a', 'b']
    edhrecRank integer, -- rec Rank in edh, example: [15650, 12702]
    faceConvertedManaCost real, -- example: [4.0, 5.0]
    faceName text, -- example: ['Dusk', 'Dawn']
    flavorName text, -- example: ['Godzilla, King of the Monsters', 'King Caesar, Ancient Guardian']
    flavorText text, -- example: ['Every tear shed is a drop of immortality', 'The perfect antidote for a tightly packe']
    frameEffects text, -- example: ['legendary', 'nyxtouched']
    frameVersion text, -- example: ['2003', '1993']
    hand text, -- example: ['1', '0']
    hasAlternativeDeckLimit integer, -- example: [0, 1]
    hasContentWarning integer, -- example: [0, 1]
    hasFoil integer, -- example: [0, 1]
    hasNonFoil integer, -- example: [1, 0]
    isAlternative integer, -- example: [0, 1]
    isFullArt integer, -- example: [0, 1]
    isOnlineOnly integer, -- example: [0, 1]
    isOversized integer, -- example: [0, 1]
    isPromo integer, -- is Promotion, example: [0, 1]
    isReprint integer, -- example: [1, 0]
    isReserved integer, -- example: [0, 1]
    isStarter integer, -- example: [0, 1]
    isStorySpotlight integer, -- example: [0, 1]
    isTextless integer, -- example: [0, 1]
    isTimeshifted integer, -- example: [0, 1]
    keywords text, -- example: ['First strike', 'Flying']
    layout text, -- example: ['normal', 'aftermath']
    leadershipSkills text, -- example: ["{'brawl': False, 'commander': True, 'oat", "{'brawl': False, 'commander': False, 'oa"]
    life text, -- example: ['-5', '-1']
    loyalty text, -- example: ['6', '3']
    manaCost text, -- example: ['{5}{W}{W}', '{4}{W}']
    mcmId text, -- example: ['16165', '16166']
    mcmMetaId text, -- example: ['156', '176']
    mtgArenaId text, -- example: ['74983', '74986']
    mtgjsonV4Id text, -- example: ['ad41be73-582f-58ed-abd4-a88c1f616ac3', '9eb2e54c-a12b-5e88-a9c0-d8c84c52d59c']
    mtgoFoilId text, -- example: ['27501', '26993']
    mtgoId text, -- example: ['27500', '26992']
    multiverseId text, -- example: ['130550', '129465']
    name text, -- example: ["Ancestor's Chosen", 'Angel of Mercy']
    number text, -- example: ['1', '2']
    originalReleaseDate text, -- example: ['2012/12/1', '2006/12/1']
    originalText text, -- example: ['First strike (This creature deals combat', "Flying (This creature can't be blocked e"]
    originalType text, -- example: ['Creature - Human Cleric', 'Creature - Angel']
    otherFaceIds text, -- example: ['87f0062a-8321-5c16-960e-a12ce1df5839', 'f9f10d34-071c-57a6-b58c-7553abad5c20']
    power text, -- example: ['4', '3']
    printings text, -- example: ['10E,JUD,UMA', '10E,8ED,9ED,DDC,DVD,IMA,INV,JMP,MB1,P02,']
    promoTypes text, -- example: ['boxtopper,boosterfun', 'boosterfun']
    purchaseUrls text, -- example: ["{'cardKingdom': 'https://mtgjson.com/lin"]
    rarity text, -- example: ['uncommon', 'common']
    scryfallId text, -- example: ['7a5cd03c-4227-4551-aa4b-7d119f0468b5', '8f7980d4-da43-4d6d-ad16-14b8a34ae91d']
    scryfallIllustrationId text, -- example: ['be2f7173-c8b7-4172-a388-9b2c6b3c16e5', 'e4d6c53f-e936-4be8-8b70-47c2be863b20']
    scryfallOracleId text, -- example: ['fc2ccab7-cab1-4463-b73d-898070136d74', 'a2daaf32-dbfe-4618-892e-0da24f63a44a']
    setCode text, -- example: ['10E', '2ED']
    side text, -- example: ['a', 'b']
    subtypes text, -- example: ['Human,Cleric', 'Angel']
    supertypes text, -- example: ['Legendary', 'Basic']
    tcgplayerProductId text, -- example: ['15032', '15033']
    text text, -- example: ['First strike (This creature deals combat', 'Flying\nWhen Angel of Mercy enters the ba']
    toughness text, -- example: ['4', '3']
    type text, -- example: ['Creature — Human Cleric', 'Creature — Angel']
    types text, -- example: ['Creature', 'Instant']
    uuid text, -- example: ['00010d56-fe38-5e35-8aed-518019aa36a5', '0001e0d0-2dcd-5640-aadc-a84765cf5fc9']
    variations text, -- example: ['b7c19924-b4bf-56fc-aa73-f586e940bd42', '8fd4e2eb-3eb4-50ea-856b-ef638fa47f8a']
    watermark text, -- example: ['set', 'set (HOU)', 'set (LGN)']
    PRIMARY KEY (id)
);

CREATE TABLE foreign_data (
    id integer, -- example: [1, 2]
    flavorText text, -- example: ['„Es ist der Wille aller, und meine Hand,', '"La voluntad de todos, realizada por mi ']
    `language` text, -- example: ['Italian', 'German', 'Spanish']
    multiverseid integer, -- example: [148411, 150317]
    name text, -- example: ['Ausgewählter der Ahnfrau', 'Elegido de la Antepasada']
    text text, -- example: ['Erstschlag (Diese Kreatur fügt Kampfscha', 'Daña primero. (Esta criatura hace daño d']
    type text, -- example: ['Kreatur — Mensch, Kleriker', 'Criatura — Clérigo humano']
    uuid text, -- example: ['5f8287b1-5bb6-5f4c-ad17-316a40d5bb0c', '57aaebc1-850c-503d-9f6e-bb8d00d8bf7c']
    PRIMARY KEY (id),
    CONSTRAINT fk_foreign_data_uuid FOREIGN KEY (uuid) REFERENCES cards (uuid)
);

CREATE TABLE legalities (
    id integer, -- example: [1, 2]
    format text, -- example: ['commander', 'duel']
    status text, -- example: ['Legal', 'Banned']
    uuid text, -- example: ['5f8287b1-5bb6-5f4c-ad17-316a40d5bb0c', '57aaebc1-850c-503d-9f6e-bb8d00d8bf7c']
    PRIMARY KEY (id),
    CONSTRAINT fk_legalities_uuid FOREIGN KEY (uuid) REFERENCES cards (uuid)
);

CREATE TABLE sets (
    id integer, -- example: [1, 2]
    baseSetSize integer, -- example: [383, 302]
    block text, -- example: ['Core Set', 'Mirrodin']
    booster text, -- example: ["{'default': {'boosters': [{'contents': {"]
    code text, -- example: ['10E', '2ED']
    isFoilOnly integer, -- example: [0, 1]
    isForeignOnly integer, -- example: [0, 1]
    isNonFoilOnly integer, -- example: [0, 1]
    isOnlineOnly integer, -- example: [0, 1]
    isPartialPreview integer, -- example: [0, 1]
    keyruneCode text, -- example: ['10E', '2ED']
    mcmId integer, -- magic card market id, example: [74, 3204]
    mcmIdExtras integer, -- magic card market ID Extras, example: [3209, 3459]
    mcmName text, -- magic card market name, example: ['Tenth Edition', 'Double Masters']
    mtgoCode text, -- magic the gathering online code, example: ['10E', '2XM']
    name text, -- example: ['Tenth Edition', 'Unlimited Edition']
    parentCode text, -- example: ['JMP', 'MH1']
    releaseDate date, -- example: ['2007-07-13', '1993-12-01']
    tcgplayerGroupId integer, -- example: [1, 115]
    totalSetSize integer, -- example: [508, 302]
    type text, -- example: ['core', 'masters']
    PRIMARY KEY (id)
);

CREATE TABLE set_translations (
    id integer, -- example: [1, 2]
    `language` text, -- example: ['Italian', 'Chinese Simplified', 'Chinese Traditional']
    setCode text, -- example: ['10E', '4ED']
    translation text, -- example: ['核心系列第十版', 'Dixième édition']
    PRIMARY KEY (id),
    CONSTRAINT fk_set_translations_setcode FOREIGN KEY (setCode) REFERENCES sets (code)
);

CREATE TABLE rulings (
    id integer, -- example: [1, 2]
    `date` date, -- example: ['2007-07-15', '2007-02-01']
    text text, -- example: ['You draw the card when Bandage resolves,', 'If you double a negative life total, you']
    uuid text, -- example: ['6d268c95-c176-5766-9a46-c14f739aba1c', '56f4935b-f6c5-59b9-88bf-9bcce20247ce']
    PRIMARY KEY (id),
    CONSTRAINT fk_rulings_uuid FOREIGN KEY (uuid) REFERENCES cards (uuid)
);
This schema describes the database's structure, including tables, columns, primary keys, foreign keys, and any relevant relationships or constraints.

Question:
Italian translation refers to language = 'Italian'; have a translation means translation is not null; base set number of under 100 refers to baseSetSize < 10
Among the sets of cards that have an Italian translation, how many of them have a base set number of under 100?

Instructions:
- Make sure you only output the information that is asked in the question. If the question asks for a specific column, make sure to only include that column in the SELECT clause, nothing more.
- The generated query should return all of the information asked in the question without any missing or extra information.
- Before generating the final SQL query, please think through the steps of how to write the query.

Output Format:
In your answer, please enclose the generated SQL query in a code block:
```sql
-- Your SQL query
```

Take a deep breath and think step by step to find the correct SQL query.
http://www.dtcms.com/a/108215.html

相关文章:

  • ‌Windows 与 Linux网络命令速查表,含常用场景及参数说明
  • 使用高德api实现天气查询
  • 多电机显示并排序
  • WHAT - 如何理解中间件
  • WPF学习路线
  • 关于Gstreamer+MPP硬件加速推流问题:视频输入video0被占用
  • MYSQL实现获取某个经纬度区域内的数据
  • Cesium系列:从入门到实践,打造属于你的3D地球应用
  • 为 Jenkins Agent 添加污点(Taint)容忍度(Toleration)
  • Dubbo分布式框架学习(1)
  • vue省市区懒加载,用el-cascader 新增和回显
  • 多模态大模型笔记
  • Compressed串行端口终端应用程序(MAC 、WIN、LINUX)打包下载
  • 高级java每日一道面试题-2025年3月19日-Web篇-防止表单重复提交的方法有哪些?
  • MySQL联合查询
  • vector的学习使用(1)
  • Cjson的创建和解析
  • 【Python】KNN:K-NearestNeighbor 学习指南
  • Vue3+Cesium+vite 入门- 项目搭建
  • HAL库 通过USB Boot进行APP程序升级
  • window11 通过cmd命令行安装 oh my zsh 的教程
  • VMware上的windows虚拟机安装使用Docker方法
  • MySQL篇(二): 核心知识深度聚簇解析:索引、非聚簇索引、回表查询、覆盖索引、超大分页处理、索引创建原则与索引失效场景
  • TDengine 权限管理与安全配置实战(二)
  • Redhat8.10 离线安装Snipe-IT v8.0.4 版本
  • 计算机网络中科大 - 第1章 结构化笔记(详细解析)
  • PostgreSQL pg_repack 重新组织表并释放表空间
  • NumPy的应用
  • 【数据结构】图的基本概念
  • 基于Django框架的基金数据可视化平台(源码+lw+部署文档+讲解),源码可白嫖!