当前位置：首页 > news >正文

Pydantic Output Parser in LangChain

news 2025/11/15 9:45:51

https://python.langchain.com.cn/docs/modules/model_io/output_parsers/pydantic

Pydantic Output Parser in LangChain

This content is based on LangChain’s official documentation (langchain.com.cn) and explains the PydanticOutputParser—a tool to parse LLM outputs into structured Pydantic models (JSON-schema compliant objects)—in simplified terms. It strictly preserves original source codes, examples, and knowledge points without arbitrary additions or modifications.

Key Note: Large language models are imperfect abstractions! Use an LLM with sufficient capacity (e.g., OpenAI’s DaVinci) to generate valid JSON—smaller models like Curie may fail to produce correctly formatted outputs.

1. What is PydanticOutputParser?

PydanticOutputParser converts unstructured LLM responses into structured Pydantic model instances.

Pydantic’s BaseModel acts as a “data schema”—it defines expected fields, types, and validation rules (like Python dataclasses but with strict type checking and coercion).
The parser injects auto-generated format_instructions into the prompt, guiding the LLM to output JSON that matches the Pydantic model.
Supports custom validation logic (e.g., “a joke’s setup must end with a question mark”) and complex types (e.g., lists of strings).

2. Step 1: Import Required Modules

The code below imports all necessary classes—exactly as in the original documentation:

from langchain.prompts import PromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI  # Included as in original import
from langchain.output_parsers import PydanticOutputParser
from pydantic import BaseModel, Field, validator
from typing import List

3. Step 2: Configure the LLM

Use a capable LLM (e.g., text-davinci-003) to ensure valid JSON output. The code is identical to the original:

model_name = "text-davinci-003"
temperature = 0.0  # Fixed temperature for consistent results
model = OpenAI(model_name=model_name, temperature=temperature)

4. Example 1: Parse a Joke into a Pydantic Model

Define a Pydantic model for a joke (with custom validation) and use the parser to extract structured data.

Step 4.1: Define the Pydantic Model

# Define the desired data structure (schema)
class Joke(BaseModel):setup: str = Field(description="question to set up a joke")  # Joke's setup (question)punchline: str = Field(description="answer to resolve the joke")  # Joke's punchline# Custom validation: Ensure the setup ends with a question mark@validator("setup")def question_ends_with_question_mark(cls, field):if field[-1] != "?":raise ValueError("Badly formed question!")return field

Step 4.2: Initialize Parser and Prompt Template

joke_query = "Tell me a joke."  # User query# Initialize parser with the Pydantic model
parser = PydanticOutputParser(pydantic_object=Joke)# Create prompt template with auto-generated format instructions
prompt = PromptTemplate(template="Answer the user query.\n{format_instructions}\n{query}\n",input_variables=["query"],partial_variables={"format_instructions": parser.get_format_instructions()}
)

Step 4.3: Generate and Parse LLM Output

# Format the prompt (inject query and format instructions)
_input = prompt.format_prompt(query=joke_query)# Get LLM response (convert prompt to string for compatibility)
output = model(_input.to_string())# Parse LLM output into the Joke model
parsed_joke = parser.parse(output)

Parsed Output (exact as original):

Joke(setup='Why did the chicken cross the road?', punchline='To get to the other side!')

5. Example 2: Parse Compound Types (List) into a Pydantic Model

Define a model with a List field (for an actor’s filmography) to demonstrate support for complex types.

Step 5.1: Define the Pydantic Model

class Actor(BaseModel):name: str = Field(description="name of an actor")  # Actor's namefilm_names: List[str] = Field(description="list of names of films they starred in")  # List of films

Step 5.2: Initialize Parser and Prompt Template

actor_query = "Generate the filmography for a random actor."  # User query# Initialize parser with the Actor model
parser = PydanticOutputParser(pydantic_object=Actor)# Reuse the same prompt template (inject new format instructions)
prompt = PromptTemplate(template="Answer the user query.\n{format_instructions}\n{query}\n",input_variables=["query"],partial_variables={"format_instructions": parser.get_format_instructions()}
)

Step 5.3: Generate and Parse LLM Output

# Format the prompt
_input = prompt.format_prompt(query=actor_query)# Get LLM response
output = model(_input.to_string())# Parse into the Actor model
parsed_actor = parser.parse(output)

Parsed Output (exact as original):

Actor(name='Tom Hanks', film_names=['Forrest Gump', 'Saving Private Ryan', 'The Green Mile', 'Cast Away', 'Toy Story'])

6. Key Details Explained

Format Instructions: parser.get_format_instructions() auto-generates rules like:
“Output a JSON object with the following keys: ‘setup’ (string, question to set up a joke), ‘punchline’ (string, answer to resolve the joke). The ‘setup’ must end with a question mark.”
This ensures the LLM outputs JSON compatible with the Pydantic model.
Custom Validation: The @validator decorator in the Joke model enforces business rules (e.g., question mark check). If the LLM’s output violates this, the parser raises a ValidationError.
Compound Types: The List[str] type in the Actor model tells the LLM to return a list of film names, and the parser converts the JSON array into a Python list.

Key Takeaways

PydanticOutputParser links LLM outputs to structured Pydantic models using auto-generated format instructions.
Define data schemas with BaseModel, add context with Field, and enforce rules with @validator.
Use capable LLMs (e.g., DaVinci) to ensure valid JSON output—smaller models may fail.
Supports complex types (lists, nested models) for versatile structured data extraction.

查看全文

http://www.dtcms.com/a/610445.html