GooseDB,一款实现服务器客户端模式的DuckDB
在网上看到韩国公司开发的一款GooseDB,
官方网站对它的介绍是DuckDB™ 的功能扩展分支,具有服务器/客户端、多会话和并发写入支持,使用 PostgreSQL 有线协议(DuckDB™是 DuckDB 基金会的商标)
使用也很简单,从下载网页下载相应平台二进制文件启动服务端,
C:\d\goosedb>goosedb start
{"level":"info","msg":"===== GooseDB Freeware(For Non-Commercial Use Only) [read_write] ===== valid until: 2027-06-24 08:00:00 +0800 CST, owner: AnyBody =====","time":"2025-09-13T18:30:10+08:00"}
{"level":"info","msg":"MaxConnections = 10","time":"2025-09-13T18:30:10+08:00"}
{"level":"info","msg":"DuckDB !!read_write!! Mode)","time":"2025-09-13T18:30:10+08:00"}
{"level":"info","msg":"SET ieee_floating_point_ops = false","time":"2025-09-13T18:30:10+08:00"}
{"level":"info","msg":"SET autoload_known_extensions=1","time":"2025-09-13T18:30:10+08:00"}
{"level":"info","msg":"SET preserve_insertion_order = false","time":"2025-09-13T18:30:10+08:00"}
{"level":"info","msg":"ATTACH ':memory:' as memdb","time":"2025-09-13T18:30:10+08:00"}
{"level":"info","msg":"LOAD parquet","time":"2025-09-13T18:30:10+08:00"}
{"level":"info","msg":"LOAD json","time":"2025-09-13T18:30:10+08:00"}
{"level":"info","msg":"GooseDB v1.3.1.1 (DuckDB v1.3.1) build_date:2025-06-22","time":"2025-09-13T18:30:10+08:00"}
{"level":"info","msg":"goosedb listening on 0.0.0.0:1234","time":"2025-09-13T18:30:10+08:00"}
{"level":"info","msg":"restapi listening on 0.0.0.0:5678","time":"2025-09-13T18:30:10+08:00"}
{"level":"info","msg":"db-file: data\\goosedb.ddb, cfg-file: conf\\goosedb.cfg","time":"2025-09-13T18:30:10+08:00"}
{"level":"info","msg":"SET ieee_floating_point_ops = false","time":"2025-09-13T18:40:47+08:00"}
{"level":"info","msg":"SET autoload_known_extensions=1","time":"2025-09-13T18:40:47+08:00"}
{"level":"info","msg":"SET preserve_insertion_order = false","time":"2025-09-13T18:40:47+08:00"}
从启动日志能看出它的默认数据文件和配置文件,可以参考文档修改。
用psql客户端连接即可。
\d\pg18\bin\psql -h 127.0.0.1 -U gooseadmin -d goosedb -p 1234
用户 gooseadmin 的口令:psql (18beta1, 服务器 16.9)
输入 "help" 来获取帮助信息.goosedb=> select version();version
-----------------------------------------------------------------------------------------------------------------------------PostgreSQL 16.9 - (GooseDB 1.3.1.1) on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.5.0 20210514 (Red Hat 8.5.0-22), 64-bit
(1 行记录)goosedb=> create table t as select * from generate_series(1,10000000)t(i);
CREATE
goosedb=> \timing on
启用计时功能.
goosedb=> select count(*) from t group by round(log(i));count_star()
--------------28528466837723284605328460502846028
(8 行记录)时间:32.029 ms
从1000万数据的分组性能来看,和DuckDB没有区别。
打开另一个客户端
\d\pg18\bin\psql -h 127.0.0.1 -U gooseadmin -d goosedb -p 1234
用户 gooseadmin 的口令:psql (18beta1, 服务器 16.9)
输入 "help" 来获取帮助信息.goosedb=> select count(*) from t;count_star()
--------------10000000
(1 行记录)goosedb=> select count(*) from '\d\yellow_tripdata_2021-01.parquet';count_star()
--------------1369769
(1 行记录)
确实能访问前一个会话的内容,也能访问客户端本地的文件。
事务处理
在第一个客户端建立一个带有默认时间戳列的表,在另一个客户端分别启用和不用事务操作它。
客户端1
C:\d>\d\pg18\bin\psql -h 127.0.0.1 -U gooseadmin -d goosedb -p 1234
用户 gooseadmin 的口令:psql (18beta1, 服务器 16.9)
输入 "help" 来获取帮助信息.goosedb=> create table t2 (ts timestamp default current_localtimestamp(), a int);
CREATE
goosedb=> insert into t2(a) values(1);
INSERT 0 1
goosedb=> from t2;ts | a
-------------------------+---2025-09-14 08:18:39.386 | 1
(1 行记录)goosedb=> select *,current_localtimestamp() c from t2;ts | a | c
-------------------------+---+-------------------------2025-09-14 08:18:39.386 | 1 | 2025-09-14 08:20:42.7042025-09-14 08:20:17.458 | 2 | 2025-09-14 08:20:42.704
(2 行记录)goosedb=> select *,current_localtimestamp() c from t2;ts | a | c
-------------------------+---+-------------------------2025-09-14 08:18:39.386 | 1 | 2025-09-14 08:22:00.0492025-09-14 08:20:17.458 | 2 | 2025-09-14 08:22:00.049
(2 行记录)goosedb=> select *,current_localtimestamp() c from t2;ts | a | c
-------------------------+---+-------------------------2025-09-14 08:18:39.386 | 1 | 2025-09-14 08:25:37.7862025-09-14 08:20:17.458 | 2 | 2025-09-14 08:25:37.7862025-09-14 08:21:37.85 | 3 | 2025-09-14 08:25:37.7862025-09-14 08:21:37.85 | 4 | 2025-09-14 08:25:37.786
(4 行记录)
客户端2
C:\Users\lt>\d\pg18\bin\psql -h 127.0.0.1 -U gooseadmin -d goosedb -p 1234
用户 gooseadmin 的口令:psql (18beta1, 服务器 16.9)
输入 "help" 来获取帮助信息.goosedb=> from t2;ts | a
-------------------------+---2025-09-14 08:18:39.386 | 1
(1 行记录)goosedb=> insert into t2(a) select 2;
INSERT 0 1
goosedb=> BEGIN TRANSACTION;
BEGIN
goosedb=*> insert into t2(a) select 3;
INSERT 0 1
goosedb=*> from t2;ts | a
-------------------------+---2025-09-14 08:18:39.386 | 12025-09-14 08:20:17.458 | 22025-09-14 08:21:37.85 | 3
(3 行记录)goosedb=*> select current_localtimestamp() c;c
------------------------2025-09-14 08:21:37.85
(1 行记录)goosedb=*> insert into t2(a) select 4;
INSERT 0 1
goosedb=*> from t2;ts | a
-------------------------+---2025-09-14 08:18:39.386 | 12025-09-14 08:20:17.458 | 22025-09-14 08:21:37.85 | 32025-09-14 08:21:37.85 | 4
(4 行记录)goosedb=*> commit;
COMMIT
goosedb=>
可见默认不启用事务,插入数据即提交,一个客户端的修改立刻被另一个看到,而启用事务后,在显式提交前,其他客户端看不到修改。
有一点不太明白,启用事务后,时间仿佛停止了,插入第一条后又过了若干时间,select current_localtimestamp() c;
的值和插入第一条的时间一模一样。查看DuckDB文档知道,
current_localtimestamp()
Description Returns the current timestamp with time zone (at the start of the transaction).
它返回的是事务开始时间。
在psql客户端上,在事务中,提示符前有一个*
符号提示。