intructor库实现可迭代对象输出
目录
- 代码
- 代码解释
- 1. 导入和初始化
- 2. 数据模型定义
- 3. 流式提取函数
- 4. 消息设置
- 5. 测试代码
- 输出结果
- 类似例子
代码
import time
from collections.abc import Iterable
from openai import OpenAI
from pydantic import BaseModel
import instructor
client = instructor.from_openai(OpenAI(api_key = "your api key",
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"))
class User(BaseModel):
name: str
job: str
age: int
def stream_extract(input: str) -> Iterable[User]:
return client.chat.completions.create_iterable(
model="qwen-turbo",
temperature=0.1,
stream=True,
response_model=User,
messages=[
{
"role": "system",
"content": "You are a perfect entity extraction system",
},
{
"role": "user",
"content": (
f"Consider the data below:\n{input}"
"Correctly segment it into entitites"
"Make sure the JSON is correct"
),
},
],
max_tokens=1000,
)
start = time.time()
for user in stream_extract(
input="Create 5 characters from the book Three Body Problem"
):
delay = round(time.time() - start, 1)
print(f"{delay} s: User({user})")
0.9 s: User(name='Ye Wenjie' job='Astronomer' age=30)
1.3 s: User(name='Wang Miao' job='Nanomaterials Researcher' age=45)
1.5 s: User(name='Chang Jie' job='Historian' age=50)
1.9 s: User(name='Da Shi' job='Military Officer' age=40)
2.3 s: User(name='Roberto da Silva' job='Brazilian Astronomer' age=60)
代码解释
1. 导入和初始化
import time
from collections.abc import Iterable
from openai import OpenAI
from pydantic import BaseModel
import instructor
client = instructor.from_openai(OpenAI(...))
- 导入必要的库,包括用于流式处理的
Iterable
- 使用 instructor 增强的 OpenAI 客户端
2. 数据模型定义
class User(BaseModel):
name: str
job: str
age: int
定义了用户数据模型,包含:
- 姓名
- 职业
- 年龄
3. 流式提取函数
def stream_extract(input: str) -> Iterable[User]:
这是核心函数,特点:
- 返回一个可迭代的 User 对象流
- 使用
create_iterable
方法实现流式响应 - 参数设置:
temperature=0.1
: 保持输出稳定性stream=True
: 启用流式输出response_model=User
: 指定响应格式
4. 消息设置
messages=[
{"role": "system", "content": "You are a perfect entity extraction system"},
{"role": "user", "content": ...}
]
- 系统提示:定义 AI 角色
- 用户提示:包含输入数据和任务说明
5. 测试代码
start = time.time()
for user in stream_extract(input="Create 5 characters from the book Three Body Problem"):
delay = round(time.time() - start, 1)
print(f"{delay} s: User({user})")
- 记录开始时间
- 流式获取并打印结果
- 显示每个结果的延迟时间
输出结果
从输出可以看到:
- 每个角色信息都是独立流式返回的
- 整个过程约耗时 2.3 秒
- 返回了 5 个《三体》中的角色信息
- 每个角色信息包含姓名、职业和年龄
这种流式处理方式的优势:
- 实时响应:不用等待所有结果
- 资源效率:内存占用更小
- 用户体验:可以看到渐进式的结果
类似例子
import time
from collections.abc import Iterable
from openai import OpenAI
from pydantic import BaseModel
import instructor
client = instructor.from_openai(OpenAI(api_key = "your api key",
base_url="https://dashscope.aliyuncs.com/compatible-mode/v1"))
class Movie(BaseModel):
title: str
director: str
year: int
genre: str
def stream_movies(input: str) -> Iterable[Movie]:
return client.chat.completions.create_iterable(
model="qwen-turbo",
temperature=0.1,
stream=True,
response_model=Movie,
messages=[
{
"role": "system",
"content": "你是一个专业的电影信息提取系统",
},
{
"role": "user",
"content": (
f"从以下内容中提取电影信息:\n{input}"
"确保提取的信息准确完整"
"返回正确的JSON格式"
),
},
],
max_tokens=1000,
)
# 测试代码
start = time.time()
for movie in stream_movies(
input="列出5部经典科幻电影"
):
delay = round(time.time() - start, 1)
print(f"{delay} s: Movie({movie})")
1.0 s: Movie(title='Blade Runner' director='Ridley Scott' year=1982 genre='Science Fiction')
1.6 s: Movie(title='The Matrix' director='Lana Wachowski and Lilly Wachowski' year=1999 genre='Science Fiction')
2.2 s: Movie(title='2001: A Space Odyssey' director='Stanley Kubrick' year=1968 genre='Science Fiction')
2.7 s: Movie(title='Solaris' director='Andrei Tarkovsky' year=1972 genre='Science Fiction')
3.3 s: Movie(title='Gattaca' director='Andrew Niccol' year=1997 genre='Science Fiction')
参考链接:https://github.com/instructor-ai/instructor/tree/main