修改PostgreSQL测试脚本使之在cedardb中运行并分析日志
先到cedardb官方网站下载二进制文件。
root@66d4e20ec1d7:/par/cedar10# tar xf "../cedar-legacy-arm64.tar.xz"
root@66d4e20ec1d7:/par/cedar10# ls
cedar
root@66d4e20ec1d7:/par/cedar10# cd cedar
root@66d4e20ec1d7:/par/cedar10/cedar# ls
README.md catalogupgrade.py cedardb
root@66d4e20ec1d7:/par/cedar10/cedar# cat README.md
参照readme文档启动交互式界面。
cedardb也不支持存储过程,所以在duckdb版本基础上修改
2025-11-05 04:00:31.017900630 UTC ERROR: DO not implemented yet
cedardb既没有printf函数也没有make_interval函数
2025-11-05 04:01:25.329726210 UTC ERROR: unknown function or overload printf(text, double precision)
at: now() - printf('%.0f days', random() * 365) *error* ::interval2025-11-05 04:08:58.400727980 UTC ERROR: unknown function or overload make_interval(integer)
at: now() - make_interval(days => floor(random() * 365)::int) *error*
但是它支持interval字面量乘以一个随机数。
> select now() - interval '365'*random();
?column?
2024-11-15 07:11:12.946666+08
修改后脚本如下
-- ========================================
-- Step 1. 创建基础表结构
-- ========================================
DROP TABLE IF EXISTS orders CASCADE;DROP TABLE IF EXISTS customers CASCADE;
DROP TABLE IF EXISTS products CASCADE;
CREATE SEQUENCE id_sequence;
CREATE SEQUENCE id_sequence2;
CREATE SEQUENCE id_sequence3;
CREATE TABLE customers (customer_id int PRIMARY KEY DEFAULT nextval('id_sequence'),name TEXT,region TEXT,join_date TIMESTAMP
);CREATE TABLE products (product_id int PRIMARY KEY DEFAULT nextval('id_sequence2'),name TEXT,category TEXT,price NUMERIC(10,2)
);CREATE TABLE orders (order_id int PRIMARY KEY DEFAULT nextval('id_sequence3'),customer_id INT REFERENCES customers(customer_id),product_id INT REFERENCES products(product_id),quantity INT,amount NUMERIC(12,2),order_date TIMESTAMP
);-- ========================================
-- Step 2. 批量插入测试数据
-- ========================================-- 2.1 插入 customers(100万)
INSERT INTO customers (name, region, join_date)
SELECT'Customer_' || (gs + (b-1)*100000),CASE WHEN random() < 0.33 THEN 'APAC'WHEN random() < 0.66 THEN 'EMEA'ELSE 'AMER' END,now() - interval '365'*random()
FROM generate_series(1,100000) AS gs(gs),generate_series(1,10)b(b) ;-- 2.2 插入 products(100万)
INSERT INTO products (name, category, price)
SELECT'Product_' || (gs + (b-1)*100000),CASE WHEN random() < 0.5 THEN 'Electronics'WHEN random() < 0.8 THEN 'Clothing'ELSE 'Food' END,round((random() * 500 + 1)::numeric, 2)
FROM generate_series(1,100000) AS gs(gs),generate_series(1,10)b(b) ;-- 2.3 插入 orders(100万)
INSERT INTO orders (customer_id, product_id, quantity, amount, order_date)
SELECTfloor(random() * 1000000 + 1)::int,floor(random() * 1000000 + 1)::int,floor(random() * 5 + 1)::int,round(((floor(random()*5)+1) * (random()*500 + 1))::numeric, 2),now() - interval '365'*random()
FROM generate_series(1,100000) AS gs(gs),generate_series(1,10)b(b) ;-- ========================================
-- Step 3. 建立索引
-- ========================================
CREATE INDEX idx_orders_cust ON orders(customer_id);
CREATE INDEX idx_orders_prod ON orders(product_id);
CREATE INDEX idx_orders_date ON orders(order_date);
CREATE INDEX idx_customers_region ON customers(region);
CREATE INDEX idx_products_cat ON products(category);VACUUM ANALYZE;-- ========================================
-- Step 4. 构造复杂查询(含 CTE + 子查询 + 聚合 + JOIN)
-- ========================================EXPLAIN ANALYZE
WITH recent_orders AS (SELECT o.order_id, o.customer_id, o.product_id, o.amount, o.order_dateFROM orders oWHERE o.order_date > now() - interval '180 days'
),
customer_region_sales AS (SELECTc.region,SUM(r.amount) AS total_sales,COUNT(DISTINCT r.customer_id) AS unique_customersFROM recent_orders rJOIN customers c ON r.customer_id = c.customer_idGROUP BY c.region
),
top_products AS (SELECTp.category,SUM(r.amount) AS total_salesFROM recent_orders rJOIN products p ON r.product_id = p.product_idGROUP BY p.category
)
SELECTcr.region,tp.category,cr.total_sales AS region_sales,tp.total_sales AS category_sales,(cr.total_sales / tp.total_sales)::numeric(10,2) AS ratio
FROM customer_region_sales cr
JOIN top_products tp ON tp.total_sales > 0
WHERE cr.total_sales > (SELECT AVG(total_sales) FROM customer_region_sales
)
ORDER BY ratio DESC
LIMIT 20;
执行结果如下
root@66d4e20ec1d7:/par/cedar10/cedar# ./cedardb -i --createdb mydbroot@66d4e20ec1d7:/par/cedar10/cedar# ./cedardb -i mydb
2025-11-05 04:06:23.002092990 UTC INFO: Using 3467 MB buffers, 3467 MB work memory
2025-11-05 04:06:23.859818070 UTC WARNING: Your storage device uses write back caching. Write durability might be delayed. See: https://cedardb.com/docs/references/writecache
2025-11-05 04:06:23.859866960 UTC INFO: You're running CEDARDB COMMUNITY EDITION - using 0 GB out of 64 GB. Our General Terms and Conditions apply to the use of the CedarDB Community Edition. Run "cedardb --license" for more information.
> \timing on2025-11-05 04:21:50.116719930 UTC ERROR: The query contains a large implicit cross product. This is most likely due to unintended missing join conditions and requires a fix to the query. If you really want that cross product please use an explicit CROSS JOIN. Or disable this check with SET implicit_cross_products = ON.
出错了,默认不允许执行大的笛卡尔积,设置参数后继续执行。
> SET implicit_cross_products = ON;
> \i /par/test_pgcd.txt
2025-11-05 04:22:39.993870960 UTC INFO: [s] execution: (0.000013 min, 0.000013 max, 0.000013 median, 0.0% relMAD, 0.000013 avg, 0.000000 sdev, 1 scale, nan IPC, nan CPUs, nan GHz) compilation: (0.000031 min, 0.000031 max, 0.000031 median, 0.0% relMAD, 0.000031 avg, nan sdev)...🖩 OUTPUT ()
▼ SORT (In-Memory) (Card: 0, Estimate: 3, Time: 0 ms (1 %))
𝚾 MAP (Card: 0, Estimate: 3)
⨝ JOIN (inner, bnljoin) (Card: 0, Estimate: 3)
├───σ SELECT (Card: 0, Estimate: 3)
│ 𝚪 GROUP BY (In-Memory) (Card: 0, Estimate: 3, Time: 0 ms (0 %))
│ ⨝ JOIN (inner, hashjoin) (Materialized: 11 MB, Utilization: 163 %, Card: 0, Estimate: 626'633)
│ ├───🗐 TABLESCAN on orders (num IOs: 0, Fetched: 0 B, Card: 0, Estimate: 500'000, Time: 0 ms (1 %))
│ └───🗐 TABLESCAN on products (num IOs: 0, Fetched: 0 B, Card: 0, Estimate: 1'000'000, Time: 0 ms (1 %))
└───⨝ JOIN (inner, singletonjoin) (Card: 0, Estimate: 1)├───σ SELECT (Card: 0, Estimate: 1)│ 𝚪 GROUP BY (In-Memory) (Card: 0, Estimate: 1, Time: 0 ms (0 %))│ σ SELECT (Card: 0, Estimate: 2)│ 🗏 CTE Scan on CTE Alder (Card: 0, Estimate: 2, Time: 0 ms (0 %))└───σ SELECT (Card: 0, Estimate: 2)🗏 CTE Scan on CTE Alder (Card: 0, Estimate: 2, Time: 1 ms (93 % ***))This plan uses:
> CTE Alder:
t TEMP (Card: 0, Estimate: 2)
σ SELECT (Card: 0, Estimate: 2)
𝚪 GROUP BY (In-Memory) (Card: 0, Estimate: 2, Time: 0 ms (0 %))
⨝ JOIN (inner, hashjoin) (Materialized: 11 MB, Utilization: 163 %, Card: 0, Estimate: 608'021)
├───🗐 TABLESCAN on orders (num IOs: 0, Fetched: 0 B, Card: 0, Estimate: 500'000, Time: 0 ms (2 %))
└───🗐 TABLESCAN on customers (num IOs: 0, Fetched: 0 B, Card: 0, Estimate: 1'000'000, Time: 0 ms (2 %))2025-11-05 04:22:56.983110170 UTC INFO: [s] execution: (0.004108 min, 0.004108 max, 0.004108 median, 0.0% relMAD, 0.004108 avg, 0.000000 sdev, 4000001 scale, nan IPC, nan CPUs, nan GHz) compilation: (0.017746 min, 0.017746 max, 0.017746 median, 0.0% relMAD, 0.017746 avg, 0.000000 sdev)
>
cedardb执行计划的树形表示很简洁,但它的计时日志有点冗余,难以查看,将其保存为文件。我用如下shell命令计算出了执行和编译的总时间
awk '{printf "%f %s %s\n", $9+$32, $9,$32}' /shujv/par/tim.txt0.000044 0.000013 0.000031
0.000012 0.000004 0.000008
0.000009 0.000003 0.000006
0.000035 0.000024 0.000011
0.000016 0.000009 0.000007
0.000014 0.000008 0.000006
0.019229 0.019162 0.000067
0.019442 0.019352 0.000090
0.071536 0.071431 0.000105
1.799576 1.619629 0.179947
1.706715 1.650359 0.056356
4.701688 4.673385 0.028303
0.295118 0.295042 0.000076
0.185758 0.185697 0.000061
0.177300 0.177245 0.000055
0.427089 0.427038 0.000051
0.311687 0.311618 0.000069
0.000052 0.000026 0.000026
可见,最耗时的部分仍是插入语句,但都比DuckDB快。
