当前位置：首页 > news >正文

Structured Output Parser in LangChain

news 2025/11/15 12:01:26

https://python.langchain.com.cn/docs/modules/model_io/output_parsers/structured

Structured Output Parser in LangChain

This content is based on LangChain’s official documentation (langchain.com.cn) and explains the StructuredOutputParser—a tool to extract structured data with multiple fields from LLM outputs—in simplified terms. It strictly preserves original source codes, examples, and knowledge points without arbitrary additions or modifications.

Key Note: While Pydantic/JSON parsers are more powerful, the StructuredOutputParser is ideal for simple data structures with only text fields (no complex validation or nested types).

1. What is StructuredOutputParser?

StructuredOutputParser lets you define multiple named fields (e.g., “answer” + “source”) and extract them as a structured dictionary from LLM outputs.

Use case: When you need the LLM to return both a direct answer and supporting information (e.g., a source URL) in an organized format.
Key feature: It generates clear format_instructions to guide the LLM to output data matching your desired fields, ensuring easy parsing into a Python dictionary.
Supports both standard LLMs (e.g., OpenAI) and chat models (e.g., ChatOpenAI).

2. Step 1: Import Required Modules

The code below imports all necessary classes—exactly as in the original documentation:

from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI

3. Step 2: Define Response Schemas

First, specify the fields you want the LLM to return (each ResponseSchema defines a field’s name and description).

Code (Exact as Original):

# Define the desired fields (name + description)
response_schemas = [ResponseSchema(name="answer", description="answer to the user's question"),ResponseSchema(name="source", description="source used to answer the user's question, should be a website.")
]

4. Step 3: Initialize the Structured Output Parser

Create the parser from the response schemas and get its auto-generated format instructions (guidelines for the LLM).

Code (Exact as Original):

output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
format_instructions = output_parser.get_format_instructions()  # Guides LLM to output structured fields

Note: The format_instructions typically tells the LLM to output data like: {"answer": "Paris", "source": "https://en.wikipedia.org/wiki/Paris"}.

5. Example 1: Use with a Standard LLM (OpenAI)

Combine the parser with a standard LLM to extract structured data for a user’s question.

Step 5.1: Create a Prompt Template

prompt = PromptTemplate(template="answer the users question as best as possible.\n{format_instructions}\n{question}",input_variables=["question"],  # Dynamic user questionpartial_variables={"format_instructions": format_instructions}  # Fixed format guidelines
)

Step 5.2: Initialize the LLM and Generate Output

model = OpenAI(temperature=0)  # Fixed temperature for consistent results
_input = prompt.format_prompt(question="what's the capital of france?")  # Format prompt with user question
output = model(_input.to_string())  # Get LLM response

Step 5.3: Parse the Output into a Dictionary

parsed_output = output_parser.parse(output)
print(parsed_output)

Parsed Output (Exact as Original):

{'answer': 'Paris', 'source': 'https://www.worldatlas.com/articles/what-is-the-capital-of-france.html'}

6. Example 2: Use with a Chat Model (ChatOpenAI)

The same parser works with chat models—only the prompt formatting and output access change (chat models return Message objects, so we use .content).

Step 6.1: Create a Chat Prompt Template

chat_prompt = ChatPromptTemplate(messages=[HumanMessagePromptTemplate.from_template("answer the users question as best as possible.\n{format_instructions}\n{question}")],input_variables=["question"],partial_variables={"format_instructions": format_instructions}
)

Step 6.2: Initialize the Chat Model and Generate Output

chat_model = ChatOpenAI(temperature=0)
_input = chat_prompt.format_prompt(question="what's the capital of france?")
output = chat_model(_input.to_messages())  # Chat model expects a list of Message objects

Step 6.3: Parse the Chat Model Output

parsed_chat_output = output_parser.parse(output.content)  # Access content via .content
print(parsed_chat_output)

Parsed Output (Exact as Original):

{'answer': 'Paris', 'source': 'https://en.wikipedia.org/wiki/Paris'}

Key Takeaways

StructuredOutputParser simplifies extracting multiple text fields into a dictionary.
ResponseSchema defines each field’s name and purpose—critical for guiding the LLM.
Works with both standard LLMs (use .to_string()) and chat models (use .content).
Best for simple structured data (no complex validation) compared to Pydantic/JSON parsers.

查看全文

http://www.dtcms.com/a/610970.html