当前位置: 首页 > news >正文

Pydantic Output Parser in LangChain

https://python.langchain.com.cn/docs/modules/model_io/output_parsers/pydantic

Pydantic Output Parser in LangChain

This content is based on LangChain’s official documentation (langchain.com.cn) and explains the PydanticOutputParser—a tool to parse LLM outputs into structured Pydantic models (JSON-schema compliant objects)—in simplified terms. It strictly preserves original source codes, examples, and knowledge points without arbitrary additions or modifications.

Key Note: Large language models are imperfect abstractions! Use an LLM with sufficient capacity (e.g., OpenAI’s DaVinci) to generate valid JSON—smaller models like Curie may fail to produce correctly formatted outputs.

1. What is PydanticOutputParser?

PydanticOutputParser converts unstructured LLM responses into structured Pydantic model instances.

  • Pydantic’s BaseModel acts as a “data schema”—it defines expected fields, types, and validation rules (like Python dataclasses but with strict type checking and coercion).
  • The parser injects auto-generated format_instructions into the prompt, guiding the LLM to output JSON that matches the Pydantic model.
  • Supports custom validation logic (e.g., “a joke’s setup must end with a question mark”) and complex types (e.g., lists of strings).

2. Step 1: Import Required Modules

The code below imports all necessary classes—exactly as in the original documentation:

from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI  # Included as in original import
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

3. Step 2: Configure the LLM

Use a capable LLM (e.g., text-davinci-003) to ensure valid JSON output. The code is identical to the original:

model_name = "text-davinci-003"
temperature = 0.0  # Fixed temperature for consistent results
model = OpenAI(model_name=model_name, temperature=temperature)

4. Example 1: Parse a Joke into a Pydantic Model

Define a Pydantic model for a joke (with custom validation) and use the parser to extract structured data.

Step 4.1: Define the Pydantic Model

# Define the desired data structure (schema)
class Joke(BaseModel):setup: str = Field(description="question to set up a joke")  # Joke's setup (question)punchline: str = Field(description="answer to resolve the joke")  # Joke's punchline# Custom validation: Ensure the setup ends with a question mark@validator("setup")def question_ends_with_question_mark(cls, field):if field[-1] != "?":raise ValueError("Badly formed question!")return field

Step 4.2: Initialize Parser and Prompt Template

joke_query = "Tell me a joke."  # User query# Initialize parser with the Pydantic model
parser = PydanticOutputParser(pydantic_object=Joke)# Create prompt template with auto-generated format instructions
prompt = PromptTemplate(template="Answer the user query.\n{format_instructions}\n{query}\n",input_variables=["query"],partial_variables={"format_instructions": parser.get_format_instructions()}
)

Step 4.3: Generate and Parse LLM Output

# Format the prompt (inject query and format instructions)
_input = prompt.format_prompt(query=joke_query)# Get LLM response (convert prompt to string for compatibility)
output = model(_input.to_string())# Parse LLM output into the Joke model
parsed_joke = parser.parse(output)

Parsed Output (exact as original):

Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

5. Example 2: Parse Compound Types (List) into a Pydantic Model

Define a model with a List field (for an actor’s filmography) to demonstrate support for complex types.

Step 5.1: Define the Pydantic Model

class Actor(BaseModel):name: str = Field(description="name of an actor")  # Actor's namefilm_names: List[str] = Field(description="list of names of films they starred in")  # List of films

Step 5.2: Initialize Parser and Prompt Template

actor_query = "Generate the filmography for a random actor."  # User query# Initialize parser with the Actor model
parser = PydanticOutputParser(pydantic_object=Actor)# Reuse the same prompt template (inject new format instructions)
prompt = PromptTemplate(template="Answer the user query.\n{format_instructions}\n{query}\n",input_variables=["query"],partial_variables={"format_instructions": parser.get_format_instructions()}
)

Step 5.3: Generate and Parse LLM Output

# Format the prompt
_input = prompt.format_prompt(query=actor_query)# Get LLM response
output = model(_input.to_string())# Parse into the Actor model
parsed_actor = parser.parse(output)

Parsed Output (exact as original):

Actor(name='Tom Hanks', film_names=['Forrest Gump', 'Saving Private Ryan', 'The Green Mile', 'Cast Away', 'Toy Story'])

6. Key Details Explained

  • Format Instructions: parser.get_format_instructions() auto-generates rules like:
    “Output a JSON object with the following keys: ‘setup’ (string, question to set up a joke), ‘punchline’ (string, answer to resolve the joke). The ‘setup’ must end with a question mark.”
    This ensures the LLM outputs JSON compatible with the Pydantic model.

  • Custom Validation: The @validator decorator in the Joke model enforces business rules (e.g., question mark check). If the LLM’s output violates this, the parser raises a ValidationError.

  • Compound Types: The List[str] type in the Actor model tells the LLM to return a list of film names, and the parser converts the JSON array into a Python list.

Key Takeaways

  • PydanticOutputParser links LLM outputs to structured Pydantic models using auto-generated format instructions.
  • Define data schemas with BaseModel, add context with Field, and enforce rules with @validator.
  • Use capable LLMs (e.g., DaVinci) to ensure valid JSON output—smaller models may fail.
  • Supports complex types (lists, nested models) for versatile structured data extraction.
http://www.dtcms.com/a/610445.html

相关文章:

  • 临海企业网站建设公司青岛网站制作套餐
  • 逻辑回归:从基础理论到实践应用的全方位解读
  • ChatGPT回答用AI怎么怎么赚钱
  • 修车店怎么做网站深圳电子商务网站建设公司
  • 自己的网站打不开网页设计与制作教程杨选辉ppt
  • MySQL Join 的原理与优化实践
  • iBM(i2)绘制资金链路(五)
  • Mybatis操作数据库(进阶)
  • 卡索(CASO)汽车调查:我们缺数据,但更缺的是对数据的“解读能力”
  • VsCode通过SSH远程连接云服务器遇到主机密钥变更问题
  • UE5.7:3D 内容生产的新范式
  • 横沥网站制作招聘外卖小程序源码
  • 网站建设 思路互联网工资一般有多少
  • 智能家居,需要的是“主控智能体”而不是“主控节点”
  • 数据科学每日总结--Day19--数据库
  • 公司做网站一般微信搜一搜seo优化
  • 智能包装加速产业重构,紧固件交付体系迎来新升级动力
  • 基于NLMS算法的自适应噪声消除方法研究
  • 精准配置重构光模块成本效能:深圳光特通信1X9、SFP单收/单发光模块
  • 手机怎么登录自己做的网站wordpress怎么加404
  • 网站备案在哪里备案象山县住房和城乡建设局网站
  • MEXA-1170HCLD 加热型 NOₓ测定装置技术解析
  • 科技感图片素材推荐:像素里的未来叙事探索
  • Spring Boot 3.X:Unable to connect to Redis错误记录
  • 深度学习--个人总结
  • RNN与LSTM详解:AI是如何“记住”信息的?
  • 临沂做网站的公司有哪些杭州网站搭建公司
  • 泰国公共建设网站免费网站建设平台 iis
  • 淄博企业网站排名优化创意营销新点子
  • 网站建设页面带声音wordpress设置后台自定义功能选项