当前位置: 首页 > news >正文

在sf=0.1时测试fireducks、duckdb、polars的tpch

首先,从https://github.1git.de/fireducks-dev/polars-tpch下载源代码包,将其解压缩到/par/fire目录。
然后进入此目录,运行
SCALE_FACTOR=0.1 ./run-fireducks.sh,脚本会首先安装所需的包,编译tpch的数据生成器,然后按照sf=0.1生成tbl文件,再转化为parquet格式,最后执行。
如下所示:

root@DESKTOP-59T6U68:/par/fire# SCALE_FACTOR=0.1 ./run-fireducks.sh
Looking in indexes: https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple
Requirement already satisfied: pyarrow in ./.venv/lib/python3.13/site-packages (20.0.0)
Requirement already satisfied: pydantic in ./.venv/lib/python3.13/site-packages (2.11.7)
Requirement already satisfied: pydantic_settings in ./.venv/lib/python3.13/site-packages (2.10.1)
Requirement already satisfied: linetimer in ./.venv/lib/python3.13/site-packages (0.1.5)
Requirement already satisfied: annotated-types>=0.6.0 in ./.venv/lib/python3.13/site-packages (from pydantic) (0.7.0)
Requirement already satisfied: pydantic-core==2.33.2 in ./.venv/lib/python3.13/site-packages (from pydantic) (2.33.2)
Requirement already satisfied: typing-extensions>=4.12.2 in ./.venv/lib/python3.13/site-packages (from pydantic) (4.14.0)
Requirement already satisfied: typing-inspection>=0.4.0 in ./.venv/lib/python3.13/site-packages (from pydantic) (0.4.1)
Requirement already satisfied: python-dotenv>=0.21.0 in ./.venv/lib/python3.13/site-packages (from pydantic_settings) (1.1.1)
make -C tpch-dbgen dbgen
make[1]: Entering directory '/par/fire/tpch-dbgen'
make[1]: 'dbgen' is up to date.
make[1]: Leaving directory '/par/fire/tpch-dbgen'
cd tpch-dbgen && ./dbgen -vf -s 0.1 && cd ..
TPC-H Population Generator (Version 2.17.2)
Copyright Transaction Processing Performance Council 1994 - 2010
Generating data for suppliers table/
Preloading text ... 100%
done.
Generating data for customers tabledone.
Generating data for orders/lineitem tablesdone.
Generating data for part/partsupplier tablesdone.
Generating data for nation tabledone.
Generating data for region tabledone.
mkdir -p "data/tables_pyarrow/scale-0.1"
mv tpch-dbgen/*.tbl data/tables_pyarrow/scale-0.1/
.venv/bin/python -m scripts.prepare_data_pyarrow
Processing table: customer
Processing table: lineitem
Processing table: nation
Processing table: orders
Processing table: part
Processing table: partsupp
Processing table: region
Processing table: supplier
rm -rf data/tables_pyarrow/scale-0.1/*.tbl
PATH_TABLES=data/tables_pyarrow .venv-fireducks/bin/python -m queries.fireducks
{"scale_factor":0.1,"large_string_comment":false,"paths":{"answers":"data/answers","tables":"data/tables_pyarrow","timings":"output/run","timings_filename":"timings.csv","plots":"output/plot"},"plot":{"show":false,"n_queries":7,"y_limit":null},"run":{"io_type":"skip","log_timings":true,"show_results":false,"check_results":false,"polars_show_plan":false,"polars_eager":false,"polars_streaming":false,"polars_new_streaming":false,"polars_gpu":false,"polars_gpu_device":0,"use_rmm_mr":"cuda-async","modin_memory":8000000000,"spark_driver_memory":"2g","spark_executor_memory":"1g","spark_log_level":"ERROR","include_io":false},"dataset_base_dir":"data/tables_pyarrow/scale-0.1"}
Code block 'Run fireducks query 1' took: 0.20121 s
Code block 'Run fireducks query 2' took: 0.52730 s
Code block 'Run fireducks query 3' took: 0.15594 s
Code block 'Run fireducks query 4' took: 0.15536 s
Code block 'Run fireducks query 5' took: 0.23419 s
Code block 'Run fireducks query 6' took: 0.11777 s
Code block 'Run fireducks query 7' took: 0.27936 s
Code block 'Run fireducks query 8' took: 0.22832 s
Code block 'Run fireducks query 9' took: 0.18384 s
Code block 'Run fireducks query 10' took: 0.33037 s
Code block 'Run fireducks query 11' took: 0.16605 s
Code block 'Run fireducks query 12' took: 0.16841 s
Code block 'Run fireducks query 13' took: 0.14314 s
Code block 'Run fireducks query 14' took: 0.13404 s
Code block 'Run fireducks query 15' took: 0.14402 s
Code block 'Run fireducks query 16' took: 0.20629 s
Code block 'Run fireducks query 17' took: 0.15346 s
Code block 'Run fireducks query 18' took: 0.19930 s
Code block 'Run fireducks query 19' took: 0.20121 s
Code block 'Run fireducks query 20' took: 0.27538 s
Code block 'Run fireducks query 21' took: 0.30119 s
Code block 'Run fireducks query 22' took: 0.22134 s
Code block 'Overall execution of ALL fireducks queries' took: 130.80006 s

如果要和其他工具的性能比较,queries目录下有duckdb、polars等的脚本,调用方法如下:

PATH_TABLES=data/tables_pyarrow SCALE_FACTOR=0.1 .venv/bin/python -m queries.duckdb
Code block 'Run duckdb query 1' took: 2.36939 s
...
Code block 'Overall execution of ALL duckdb queries' took: 88.98257 sPATH_TABLES=data/tables_pyarrow SCALE_FACTOR=0.1 .venv/bin/python -m queries.polars
Code block 'Run polars query 1' took: 0.34880 s
...
Code block 'Overall execution of ALL polars queries' took: 61.85478 s

fireducks的这个脚本是从polars那里fork的,不知做了什么加工,单个查询duckdb比polars和fireducks慢很多,相差10倍,难以置信。直接用如下语句测试,明明不到1秒

import duckdb
q1="""
SELECTl_returnflag,l_linestatus,SUM(l_quantity) AS sum_qty,SUM(l_extendedprice) AS sum_base_price,SUM(l_extendedprice * (1 - l_discount)) AS sum_disc_price,SUM(l_extendedprice * (1 - l_discount) * (1 + l_tax)) AS sum_charge,AVG(l_quantity) AS avg_qty,AVG(l_extendedprice) AS avg_price,AVG(l_discount) AS avg_disc,COUNT(*) AS count_order
FROM'data/tables_pyarrow/scale-0.1/lineitem.parquet' l
WHEREl_shipdate <= CAST('1998-09-02' AS date)
GROUP BYl_returnflag,l_linestatus
ORDER BYl_returnflag,l_linestatus;
"""
import time
t=time.time();df = duckdb.sql(q1);df.show();print(time.time()-t)
 .venv/bin/python /par/duckdbq1.py
┌──────────────┬──────────────┬─────────┬────────────────────┬───────────────────┬────────────────────┬────────────────────┬───────────────────┬─────────────────────┬─────────────┐
│ l_returnflag │ l_linestatus │ sum_qty │   sum_base_price   │  sum_disc_price   │     sum_charge     │      avg_qty       │     avg_price     │      avg_disc       │ count_order │
│   varchar    │   varchar    │ int128  │       double       │      double       │       double       │       double       │      double       │       double        │    int64    │
├──────────────┼──────────────┼─────────┼────────────────────┼───────────────────┼────────────────────┼────────────────────┼───────────────────┼─────────────────────┼─────────────┤
│ A            │ F            │ 37742005320753880.689985054096266.6828355256751331.44926725.53758711685499736002.1238290140.05014459706345448147790 │
│ N            │ F            │   95257133737795.83999994127132372.6512132286291.2294447325.3006640106241735521.326916334650.04939442231075733765 │
│ N            │ O            │ 745929710512270008.899929986238338.38476610385578376.58547625.54553767123287536000.924688013420.05009595890418491292000 │
│ R            │ F            │ 37855235337950526.46987155071818532.9421015274405503.04936625.525943857425135994.029214030060.04998927856189752148301 │
└──────────────┴──────────────┴─────────┴────────────────────┴───────────────────┴────────────────────┴────────────────────┴───────────────────┴─────────────────────┴─────────────┘0.6631364822387695
http://www.dtcms.com/a/267529.html

相关文章:

  • OpenLayers 设置线段样式
  • 深入学习c++之---AVL树
  • 支持零样本和少样本的文本到语音48k star的配音工具:GPT-SoVITS-WebUI
  • 完成ssl不安全警告
  • DQL-6-分页查询
  • Redis的编译安装
  • PVE DDNS IPV6
  • 超详细yolo8/11-detect目标检测全流程概述:配置环境、数据标注、训练、验证/预测、onnx部署(c++/python)详解
  • Altium Designer使用教程 第一章(Altium Designer工程与窗口)
  • ESXi 8.0 SATA硬盘直通
  • python-字符串
  • 量化可复用的UI评审标准(试验稿)
  • OPENPPP2 VDNS 核心域模块深度解析
  • 电源管理芯片(PMIC) 和 电池管理芯片(BMIC)又是什么?ING
  • webpack+vite前端构建工具 -11实战中的配置技巧
  • 合肥工会入会的注意事项和常见问答
  • springBoot接口层时间参数JSON序列化问题,兼容处理
  • Modbus_TCP_V4 客户端
  • Day52
  • 人工智能-基础篇-18-什么是RAG(检索增强生成:知识库+向量化技术+大语言模型LLM整合的技术框架)
  • ES6-in 的用法
  • Apollo自动驾驶系统中Planning(路径规划)模块的架构设计和核心逻辑
  • leetcode86.分隔链表
  • 1. 两数之和 (leetcode)
  • 【网络】Linux 内核优化实战 - net.ipv4.tcp_timestamps
  • 【Docker基础】Docker数据卷管理:docker volume prune及其参数详解
  • CSS 文字浮雕效果:巧用 text-shadow 实现 3D 立体文字
  • 一体化步进伺服电机在无人机扫地机器人中的应用案例
  • 隐马尔可夫模型:语音识别系统的时序解码引擎
  • 写传播和写策略