DuckDB客户端API之ADBC官方文档翻译
原文官方链接:https://duckdb.org/docs/stable/clients/adbc
The latest stable version of the DuckDB ADBC client is 1.4.0.
DuckDB ADBC客户端的当前最新稳定版本是1.4.0。(时间202509)
Arrow Database Connectivity (ADBC) , similarly to ODBC and JDBC, is a C-style API that enables code portability between different database systems. This allows developers to effortlessly build applications that communicate with database systems without using code specific to that system. The main difference between ADBC and ODBC/JDBC is that ADBC uses Arrow to transfer data between the database system and the application. DuckDB has an ADBC driver, which takes advantage of the zero-copy integration between DuckDB and Arrow to efficiently transfer data.
ADBC 是指Arrow数据库的连接。类似ODBC、JDBC,是一种C风格的API,能够实现不同数据库系统之间的代码可移植性。让开发人员轻松构建与数据库系统通信的应用程序,无需使用特定数据库的代码。ADBC与ODBC/JDBC的主要区别在于,ADBC使用Arrow在数据库和应用程序之间传输数据。DuckDB通过ADBC驱动程序利用与Arrow之间的零复制集成高效传输数据。
Please refer to the ADBC documentation page for a more extensive discussion on ADBC and a detailed API explanation.
有关ADBC的更深入讨论和详细的API说明,请参考ADBC文档页面 。
已实现功能
Implemented Functionality
The DuckDB-ADBC driver implements the full ADBC specification, with the exception of the ConnectionReadPartition and StatementExecutePartitions functions. Both of these functions exist to support systems that internally partition the query results, which does not apply to DuckDB. In this section, we will describe the main functions that exist in ADBC, along with the arguments they take and provide examples for each function.
DuckDB-ADBC驱动程序实现了完整的ADBC规范,除了ConnectionReadPartition和StatementExecutePartitions函数之外。这两个函数是为了支持在内部对查询结果进行分区的系统,而这并不适用于DuckDB。本节介绍ADBC的主要函数、参数和示例。
数据库 Database
Set of functions that operate on a database.
数据库操作函数集。
1.DatabaseNew
Allocate a new (but uninitialized) database.
分配一个新的(但未初始化的)数据库。
Arguments 参数
(AdbcDatabase *database, //数据库AdbcError *error //错误
)
Example 示例
AdbcDatabaseNew(&adbc_database, &adbc_error)
2.DatabaseSetOption
Set a char* option.
设置数据库选项,char*类型。
Arguments 参数
(AdbcDatabase *database, //数据库const char *key, //选项名const char *value, //选项值AdbcError *error /错误
)
Example 示例
AdbcDatabaseSetOption(&adbc_database, "path", "test.db", &adbc_error)
3.DatabaseInit
Finish setting options and initialize the database.
完成选项设置并初始化数据库。
Arguments 参数
(AdbcDatabase *database, //数据库AdbcError *error //错误
)
Example 示例
AdbcDatabaseInit(&adbc_database, &adbc_error)
4.DatabaseRelease
Destroy the database.
销毁数据库。
Arguments 参数
(AdbcDatabase *database, //数据库AdbcError *error //错误
)
Example 示例
AdbcDatabaseRelease(&adbc_database, &adbc_error)
Connection连接
A set of functions that create and destroy a connection to interact with a database.
创建和销毁数据库连接的函数集。
1.ConnectionNew
Allocate a new (but uninitialized) connection.
分配新的(未初始化)连接。
Arguments 参数
(AdbcConnection*, //数据库连接AdbcError* //错误
)
Example 示例
AdbcConnectionNew(&adbc_connection, &adbc_error)
2.ConnectionSetOption
Options may be set before ConnectionInit.
在初始化连接ConnectionInit之前设置选项。
Arguments 参数
(AdbcConnection*, //数据库连接const char*, //选项名const char*, //选项值AdbcError* //错误
)
Example 示例
AdbcConnectionSetOption(&adbc_connection, ADBC_CONNECTION_OPTION_AUTOCOMMIT, ADBC_OPTION_VALUE_DISABLED, &adbc_error)
3.ConnectionInit
Finish setting options and initialize the connection.
完成选项设置并初始化连接。
Arguments 参数
(AdbcConnection*, //数据库连接AdbcDatabase*, //数据库AdbcError* //错误
)
Example 示例
AdbcConnectionInit(&adbc_connection, &adbc_database, &adbc_error)
4.ConnectionRelease
Destroy this connection.
销毁数据库连接。
Arguments 参数
(AdbcConnection*, //数据库连接AdbcError* //错误
)
Example 示例
AdbcConnectionRelease(&adbc_connection, &adbc_error)
A set of functions that retrieve metadata about the database. In general, these functions will return Arrow objects, specifically an ArrowArrayStream.
元数据函数集。
通常这些函数会返回ArrowArrayStream对象。
5.ConnectionGetObjects
Get a hierarchical view of all catalogs, database schemas, tables, and columns.
获取所有目录、数据库模式、表和列的分层视图。
Arguments 参数
(AdbcConnection*, //数据库连接int, const char*, //目录const char*, //模式const char*, //表const char**, //列数组const char*, //??ArrowArrayStream*, //返回的Arrow流AdbcError* //错误
)
Example 示例
AdbcDatabaseInit(&adbc_database, &adbc_error)
**原文示例错误**
6.ConnectionGetTableSchema
Get the Arrow schema of a table.
获取表的Arrow模式。
Arguments 参数
(AdbcConnection*, //数据库连接const char*, const char*, const char*, ArrowSchema*, //返回的Arrow模式AdbcError* //错误
)
Example 示例
AdbcDatabaseRelease(&adbc_database, &adbc_error)
**原文示例错误**
7.ConnectionGetTableTypes
Get a list of table types in the database.
获取数据库中的表类型列表。
Arguments 参数
(AdbcConnection*, //数据库连接ArrowArrayStream*, //返回的Arrow流AdbcError* //错误
)
Example 示例
AdbcDatabaseNew(&adbc_database, &adbc_error)
**原文示例错误
A set of functions with transaction semantics for the connection. By default, all connections start with auto-commit mode on, but this can be turned off via the ConnectionSetOption function.
具有连接事务语义的函数集。
默认数据库连接开启自动提交模式。
可以通过ConnectionSetOption函数关闭自动提交。
8.ConnectionCommit
Commit any pending transactions.
提交所有未完成的事务。
Arguments 参数
(AdbcConnection*, //数据库连接AdbcError* //错误
)
Example 示例
AdbcConnectionCommit(&adbc_connection, &adbc_error)
9.ConnectionRollback
Rollback any pending transactions.
回滚所有未完成的事务。
Arguments 参数
(AdbcConnection*, //数据库连接AdbcError* //错误
)
Example 示例
AdbcConnectionRollback(&adbc_connection, &adbc_error)
语句 Statement
Statements hold state related to query execution. They represent both one-off queries and prepared statements. They can be reused; however, doing so will invalidate prior result sets from that statement.
语句对象保存与查询执行相关的状态。
既可用于一次性查询,也可用于预准备语句。
语句对象可以重复使用,不过之前的查询结果集失效。
The functions used to create, destroy, and set options for a statement:
创建、销毁语句对象以及设置语句选项的函数集:
1.StatementNew
Create a new statement for a given connection.
为给定的连接创建一个新语句。
Arguments 参数
(AdbcConnection*, //数据库连接AdbcStatement*, //初始化的语句AdbcError* //错误
)
Example 示例
AdbcStatementNew(&adbc_connection, &adbc_statement, &adbc_error)
2.StatementRelease
Destroy a statement.
销毁一条语句。
Arguments 参数
(AdbcStatement*, //要销毁的语句AdbcError* //错误
)
Example 示例
AdbcStatementRelease(&adbc_statement, &adbc_error)
3.StatementSetOption
Set a string option on a statement.
为语句设置字符串选项。
Arguments 参数
(AdbcStatement*, //语句const char*, //选项名const char*, //选项值AdbcError* //错误
)
Example 示例
StatementSetOption(&adbc_statement, ADBC_INGEST_OPTION_TARGET_TABLE, "TABLE_NAME", &adbc_error)
Functions related to query execution:
查询函数集:
4.StatementSetSqlQuery
Set the SQL query to execute. The query can then be executed with StatementExecuteQuery.
设置要执行的SQL查询。然后可以使用StatementExecuteQuery执行该查询。
Arguments 参数
(AdbcStatement*, //语句const char*, //sqlAdbcError* //错误
)
Example 示例
AdbcStatementSetSqlQuery(&adbc_statement, "SELECT * FROM TABLE", &adbc_error)
5.StatementSetSubstraitPlan
Set a substrait plan to execute. The query can then be executed with StatementExecuteQuery.
设置substrait执行计划,使用StatementExecuteQuery执行该查询。
Arguments 参数
(AdbcStatement*, //语句const uint8_t*, //计划size_t, //?长度AdbcError* //错误
)
Example 示例
AdbcStatementSetSubstraitPlan(&adbc_statement, substrait_plan, length, &adbc_error)
6.StatementExecuteQuery
Execute a statement and get the results.
执行一条语句并获取结果。
Arguments 参数
(AdbcStatement*, //语句ArrowArrayStream*, //返回Arrow流int64_t*, //影响行数AdbcError* //错误
)
Example 示例
AdbcStatementExecuteQuery(&adbc_statement, &arrow_stream, &rows_affected, &adbc_error)
7.StatementPrepare
Turn this statement into a prepared statement to be executed multiple times.
将此语句转换为可多次执行的预处理语句。
Arguments 参数
(AdbcStatement*, //语句对象AdbcError* //错误对象
)
Example 示例
AdbcStatementPrepare(&adbc_statement, &adbc_error)
Functions related to binding, used for bulk insertion or in prepared statements.
绑定相关函数,用于批量插入或预编译语句中。
8.StatementBindStream
Bind Arrow Stream. This can be used for bulk inserts or prepared statements.
绑定Arrow流。用于批量插入或预处理语句。
Arguments 参数
(AdbcStatement*, //语句ArrowArrayStream*, //绑定的Arrow流AdbcError* //错误
)
Example 示例
StatementBindStream(&adbc_statement, &input_data, &adbc_error)
设置DuckDB ADBC驱动程序
Setting Up the DuckDB ADBC Driver
Before using DuckDB as an ADBC driver, you must install the libduckdb shared library on your system and make it available to your application. This library contains the core DuckDB engine that the ADBC driver interfaces with.
使用DuckDB ADBC驱动程序前,必须安装配置libduckdb共享库。该库包含ADBC驱动程序接口的核心DuckDB引擎。
下载libduckdb Downloading libduckdb
Download the appropriate libduckdb library for your platform from the DuckDB releases page :
从DuckDB发布页面下载适配平台的libduckdb库:
• Linux : libduckdb-linux-amd64.zip (contains libduckdb.so)
• macOS : libduckdb-osx-universal.zip (contains libduckdb.dylib)
• Windows: libduckdb-windows-amd64.zip (contains duckdb.dll)
Extract the archive to obtain the shared library file.
解压归档文件以获取共享库文件。
Installing the Library 安装库
Linux
1. Extract the libduckdb.so file from the downloaded archive
从下载的压缩包中提取libduckdb.so文件
2. Make sure your code can use the library. You can:
库文件配置方法:
• Either copy it to a system library directory (requires root access):
将其复制到系统库目录(需要root权限):
sudo cp libduckdb.so /usr/local/lib/sudo ldconfig
• Or place it in a custom directory and add that directory to your LD_LIBRARY_PATH:
将其放在定制目录并添加到环境变量中LD_LIBRARY_PATH中:
mkdir -p ~/libcp libduckdb.so ~/lib/export LD_LIBRARY_PATH=~/lib:$LD_LIBRARY_PATH
macOS 苹果操作系统
1. Extract the libduckdb.dylib file from the downloaded archive
从下载的压缩包中提取libduckdb.dylib文件
2. Make sure your code can use the library. You can:
库文件配置方法:
• Either copy it to a system library directory:
将其复制到系统库目录:
sudo cp libduckdb.dylib /usr/local/lib/
• Or place it in a custom directory and add that directory to your DYLD_LIBRARY_PATH:
将其放在定制目录并添加到环境变量DYLD_LIBRARY_PATH中:
mkdir -p ~/libcp libduckdb.dylib ~/lib/export DYLD_LIBRARY_PATH=~/lib:$DYLD_LIBRARY_PATH
Windows
1. Extract the duckdb.dll file from the downloaded archive
从下载的压缩包中提取duckdb.dll文件
2. Place it in one of the following locations:
将其放在以下位置之一:
• The same directory as your application executable
与应用程序可执行文件相同的目录
• A directory listed in your PATH environment variable
PATH环境变量中列出的目录
• The Windows system directory (e.g., C:\Windows\System32)
Windows系统目录(例如,C:\Windows\System32)
理解库路径 Understanding Library Paths
The LD_LIBRARY_PATH (Linux) and DYLD_LIBRARY_PATH (macOS) are environment variables that tell the system where to look for shared libraries at runtime. When your application tries to load libduckdb, the system searches these paths to locate the library file.
LD_LIBRARY_PATH(Linux)和DYLD_LIBRARY_PATH(macOS)是环境变量,用于系统在运行时查找共享库。当应用程序尝试加载libduckdb时,系统会搜索这些路径来定位库文件。
验证安装 Verifying Installation
You can verify that the library is properly installed and accessible:
验证该库是否已安装正确并配置可用:
Linux/macOS:
ldd path/to/your/application # Linux
otool -L path/to/your/application # macOS
示例 Examples
Regardless of the programming language being used, there are two database options which will be required to utilize ADBC with DuckDB. The first one is the driver, which takes a path to the DuckDB library (see Setting Up the DuckDB ADBC Driver above for installation instructions). The second option is the entrypoint, which is an exported function from the DuckDB-ADBC driver that initializes all the ADBC functions. Once we have configured these two options, we can optionally set the path option, providing a path on disk to store our DuckDB database. If not set, an in-memory database is created. After configuring all the necessary options, we can proceed to initialize our database. Below is how you can do so with various different language environments.
DuckDB-ADBC数据库选项有三个,前两个为必选项:
第一个是驱动程序(即DuckDB库)路径(参见前文DuckDB ADBC驱动程序设置)。
第二个选项是入口点(即DuckDB-ADBC驱动程序的导出函数),用于初始化所有ADBC函数。
第三个是可选配置路径选项(即DuckDB数据库文件路径),缺省则会创建内存数据库。
配置选项后就可以初始化数据库了,以下是几种开发语言的实现方法。
C++ 示例
We begin our C++ example by declaring the essential variables for querying data through ADBC. These variables include Error, Database, Connection, Statement handling, and an Arrow Stream to transfer data between DuckDB and the application.
声明ADBC基本变量,包括错误处理、数据库、连接、语句处理,以及用于在DuckDB和应用程序之间传输数据的Arrow流。
AdbcError adbc_error; //错误AdbcDatabase adbc_database; //数据库AdbcConnection adbc_connection; //数据库连接AdbcStatement adbc_statement; //语句ArrowArrayStream arrow_stream; //Arrow流
We can then initialize our database variable. Before initializing the database, we need to set the driver and entrypoint options as mentioned above. Then we set the path option and initialize the database. The driver option should point to your installed libduckdb library – see Setting Up the DuckDB ADBC Driver for installation instructions.
初始化数据库变量。
设置driver和entrypoint选项。
设置path选项并初始化数据库。
AdbcDatabaseNew(&adbc_database, &adbc_error);
AdbcDatabaseSetOption(&adbc_database, "driver", "path/to/libduckdb.dylib", &adbc_error);
AdbcDatabaseSetOption(&adbc_database, "entrypoint", "duckdb_adbc_init", &adbc_error);
// By default, we start an in-memory database, but you can optionally define a path to store it on disk.
// 下面选项定义数据库文件路径,缺省将启动一个内存数据库
AdbcDatabaseSetOption(&adbc_database, "path", "test.db", &adbc_error);
AdbcDatabaseInit(&adbc_database, &adbc_error);
// After initializing the database, we must create and initialize a connection to it.
// 初始化数据库后必须创建并初始化一个数据库连接。
AdbcConnectionNew(&adbc_connection, &adbc_error);
AdbcConnectionInit(&adbc_connection, &adbc_database, &adbc_error);
We can now initialize our statement and run queries through our connection. After the AdbcStatementExecuteQuery the arrow_stream is populated with the result.
初始化语句对象并通过连接运行查询。
在AdbcStatementExecuteQuery之后,arrow_stream中会填充查询结果。
AdbcStatementNew(&adbc_connection, &adbc_statement, &adbc_error);
AdbcStatementSetSqlQuery(&adbc_statement, "SELECT 42", &adbc_error);
int64_t rows_affected;
AdbcStatementExecuteQuery(&adbc_statement, &arrow_stream, &rows_affected, &adbc_error);
arrow_stream.release(arrow_stream)
Besides running queries, we can also ingest data via arrow_streams. For this we need to set an option with the table name we want to insert to, bind the stream and then execute the query.
除了查询,还可以通过arrow_streams导入数据。
设置选项指定要插入的表名
绑定输入了Arrow流
执行查询
StatementSetOption(&adbc_statement, ADBC_INGEST_OPTION_TARGET_TABLE, "AnswerToEverything", &adbc_error);
StatementBindStream(&adbc_statement, &arrow_stream, &adbc_error);
StatementExecuteQuery(&adbc_statement, nullptr, nullptr, &adbc_error);
Python示例
The first thing to do is to use pip and install the ADBC Driver manager. You will also need to install the pyarrow to directly access Apache Arrow formatted result sets (such as using fetch_arrow_table).
首先使用pip安装ADBC驱动管理器、pyarrow。
pyarrow可以访问Apache Arrow格式的结果集(例如使用fetch_arrow_table)。
pip install adbc_driver_manager pyarrow
For details on the adbc_driver_manager package, see the adbc_driver_manager package documentation .
有关adbc_driver_manager包的详细信息,请参阅adbc_driver_manager包文档 。
As with C++, we need to provide initialization options consisting of the location of the libduckdb shared object and entrypoint function. Notice that the path argument for DuckDB is passed in through the db_kwargs dictionary.
提供初始化选项:
libduckdb共享库路径
入口点函数
DuckDB库文件的path参数通过db_kwargs字典传入
import adbc_driver_duckdb.dbapiwith adbc_driver_duckdb.dbapi.connect("test.db") as conn, conn.cursor() as cur:cur.execute("SELECT 42")# fetch a pyarrow tabletbl = cur.fetch_arrow_table()print(tbl)
Alongside fetch_arrow_table, other methods from DBApi are also implemented on the cursor, such as fetchone and fetchall. Data can also be ingested via arrow_streams. We just need to set options on the statement to bind the stream of data and execute the query.
除了fetch_arrow_table之外,DBApi的其他方法也在游标上实现,例如fetchone和fetchall。
也可以通过arrow_streams导入数据。在语句对象上设置选项、绑定数据流并执行查询即可。
import adbc_driver_duckdb.dbapi
import pyarrowdata = pyarrow.record_batch([[1, 2, 3, 4], ["a", "b", "c", "d"]],names = ["ints", "strs"],
)with adbc_driver_duckdb.dbapi.connect("test.db") as conn, conn.cursor() as cur:cur.adbc_ingest("AnswerToEverything", data)
Go 示例
Make sure to install the libduckdb library first – see Setting Up the DuckDB ADBC Driver for detailed installation instructions.
先安装libduckdb库。详细安装说明参见设置DuckDB ADBC驱动程序。
The following example uses an in-memory DuckDB database to modify in-memory Arrow RecordBatches via SQL queries:
以下示例使用DuckDB内存数据库,通过SQL查询修改内存中的Arrow记录批次:
package mainimport ("bytes""context""fmt""io""github.com/apache/arrow-adbc/go/adbc""github.com/apache/arrow-adbc/go/adbc/drivermgr""github.com/apache/arrow-go/v18/arrow""github.com/apache/arrow-go/v18/arrow/array""github.com/apache/arrow-go/v18/arrow/ipc""github.com/apache/arrow-go/v18/arrow/memory"
)func _makeSampleArrowRecord() arrow.Record {b := array.NewFloat64Builder(memory.DefaultAllocator)b.AppendValues([]float64{1, 2, 3}, nil)col := b.NewArray()defer col.Release()defer b.Release()schema := arrow.NewSchema([]arrow.Field{{Name: "column1", Type: arrow.PrimitiveTypes.Float64}}, nil)return array.NewRecord(schema, []arrow.Array{col}, int64(col.Len()))
}type DuckDBSQLRunner struct {ctx context.Contextconn adbc.Connectiondb adbc.Database
}func NewDuckDBSQLRunner(ctx context.Context) (*DuckDBSQLRunner, error) {var drv drivermgr.Driverdb, err := drv.NewDatabase(map[string]string{"driver": "duckdb","entrypoint": "duckdb_adbc_init","path": ":memory:",})if err != nil {return nil, fmt.Errorf("failed to create new in-memory DuckDB database: %w", err)}conn, err := db.Open(ctx)if err != nil {return nil, fmt.Errorf("failed to open connection to new in-memory DuckDB database: %w", err)}return &DuckDBSQLRunner{ctx: ctx, conn: conn, db: db}, nil
}func serializeRecord(record arrow.Record) (io.Reader, error) {buf := new(bytes.Buffer)wr := ipc.NewWriter(buf, ipc.WithSchema(record.Schema()))if err := wr.Write(record); err != nil {return nil, fmt.Errorf("failed to write record: %w", err)}if err := wr.Close(); err != nil {return nil, fmt.Errorf("failed to close writer: %w", err)}return buf, nil
}func (r *DuckDBSQLRunner) importRecord(sr io.Reader) error {rdr, err := ipc.NewReader(sr)if err != nil {return fmt.Errorf("failed to create IPC reader: %w", err)}defer rdr.Release()_, err = adbc.IngestStream(r.ctx, r.conn, rdr, "temp_table", adbc.OptionValueIngestModeCreate, adbc.IngestStreamOptions{})return err
}func (r *DuckDBSQLRunner) runSQL(sql string) ([]arrow.Record, error) {stmt, err := r.conn.NewStatement()if err != nil {return nil, fmt.Errorf("failed to create new statement: %w", err)}defer stmt.Close()if err := stmt.SetSqlQuery(sql); err != nil {return nil, fmt.Errorf("failed to set SQL query: %w", err)}out, n, err := stmt.ExecuteQuery(r.ctx)if err != nil {return nil, fmt.Errorf("failed to execute query: %w", err)}defer out.Release()result := make([]arrow.Record, 0, n)for out.Next() {rec := out.Record()rec.Retain() // .Next() will release the record, so we need to retain itresult = append(result, rec)}if out.Err() != nil {return nil, out.Err()}return result, nil
}func (r *DuckDBSQLRunner) RunSQLOnRecord(record arrow.Record, sql string) ([]arrow.Record, error) {serializedRecord, err := serializeRecord(record)if err != nil {return nil, fmt.Errorf("failed to serialize record: %w", err)}if err := r.importRecord(serializedRecord); err != nil {return nil, fmt.Errorf("failed to import record: %w", err)}result, err := r.runSQL(sql)if err != nil {return nil, fmt.Errorf("failed to run SQL: %w", err)}if _, err := r.runSQL("DROP TABLE temp_table"); err != nil {return nil, fmt.Errorf("failed to drop temp table after running query: %w", err)}return result, nil
}func (r *DuckDBSQLRunner) Close() {r.conn.Close()r.db.Close()
}func main() {rec := _makeSampleArrowRecord()fmt.Println(rec)runner, err := NewDuckDBSQLRunner(context.Background())if err != nil {panic(err)}defer runner.Close()resultRecords, err := runner.RunSQLOnRecord(rec, "SELECT column1+1 FROM temp_table")if err != nil {panic(err)}for _, resultRecord := range resultRecords {fmt.Println(resultRecord)resultRecord.Release()}
}
Running it produces the following output:
运行输出:
record:schema:fields: 1- column1: type=float64rows: 3col[0][column1]: [1 2 3]record:schema:fields: 1- (column1 + 1): type=float64, nullablerows: 3col[0][(column1 + 1)]: [2 3 4]