Structured Output Parser in LangChain
https://python.langchain.com.cn/docs/modules/model_io/output_parsers/structured
Structured Output Parser in LangChain
This content is based on LangChain’s official documentation (langchain.com.cn) and explains the StructuredOutputParser—a tool to extract structured data with multiple fields from LLM outputs—in simplified terms. It strictly preserves original source codes, examples, and knowledge points without arbitrary additions or modifications.
Key Note: While Pydantic/JSON parsers are more powerful, the StructuredOutputParser is ideal for simple data structures with only text fields (no complex validation or nested types).
1. What is StructuredOutputParser?
StructuredOutputParser lets you define multiple named fields (e.g., “answer” + “source”) and extract them as a structured dictionary from LLM outputs.
- Use case: When you need the LLM to return both a direct answer and supporting information (e.g., a source URL) in an organized format.
- Key feature: It generates clear
format_instructionsto guide the LLM to output data matching your desired fields, ensuring easy parsing into a Python dictionary. - Supports both standard LLMs (e.g.,
OpenAI) and chat models (e.g.,ChatOpenAI).
2. Step 1: Import Required Modules
The code below imports all necessary classes—exactly as in the original documentation:
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
from langchain.prompts import PromptTemplate, ChatPromptTemplate, HumanMessagePromptTemplate
from langchain.llms import OpenAI
from langchain.chat_models import ChatOpenAI
3. Step 2: Define Response Schemas
First, specify the fields you want the LLM to return (each ResponseSchema defines a field’s name and description).
Code (Exact as Original):
# Define the desired fields (name + description)
response_schemas = [ResponseSchema(name="answer", description="answer to the user's question"),ResponseSchema(name="source", description="source used to answer the user's question, should be a website.")
]
4. Step 3: Initialize the Structured Output Parser
Create the parser from the response schemas and get its auto-generated format instructions (guidelines for the LLM).
Code (Exact as Original):
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)
format_instructions = output_parser.get_format_instructions() # Guides LLM to output structured fields
Note: The format_instructions typically tells the LLM to output data like: {"answer": "Paris", "source": "https://en.wikipedia.org/wiki/Paris"}.
5. Example 1: Use with a Standard LLM (OpenAI)
Combine the parser with a standard LLM to extract structured data for a user’s question.
Step 5.1: Create a Prompt Template
prompt = PromptTemplate(template="answer the users question as best as possible.\n{format_instructions}\n{question}",input_variables=["question"], # Dynamic user questionpartial_variables={"format_instructions": format_instructions} # Fixed format guidelines
)
Step 5.2: Initialize the LLM and Generate Output
model = OpenAI(temperature=0) # Fixed temperature for consistent results
_input = prompt.format_prompt(question="what's the capital of france?") # Format prompt with user question
output = model(_input.to_string()) # Get LLM response
Step 5.3: Parse the Output into a Dictionary
parsed_output = output_parser.parse(output)
print(parsed_output)
Parsed Output (Exact as Original):
{'answer': 'Paris', 'source': 'https://www.worldatlas.com/articles/what-is-the-capital-of-france.html'}
6. Example 2: Use with a Chat Model (ChatOpenAI)
The same parser works with chat models—only the prompt formatting and output access change (chat models return Message objects, so we use .content).
Step 6.1: Create a Chat Prompt Template
chat_prompt = ChatPromptTemplate(messages=[HumanMessagePromptTemplate.from_template("answer the users question as best as possible.\n{format_instructions}\n{question}")],input_variables=["question"],partial_variables={"format_instructions": format_instructions}
)
Step 6.2: Initialize the Chat Model and Generate Output
chat_model = ChatOpenAI(temperature=0)
_input = chat_prompt.format_prompt(question="what's the capital of france?")
output = chat_model(_input.to_messages()) # Chat model expects a list of Message objects
Step 6.3: Parse the Chat Model Output
parsed_chat_output = output_parser.parse(output.content) # Access content via .content
print(parsed_chat_output)
Parsed Output (Exact as Original):
{'answer': 'Paris', 'source': 'https://en.wikipedia.org/wiki/Paris'}
Key Takeaways
StructuredOutputParsersimplifies extracting multiple text fields into a dictionary.ResponseSchemadefines each field’s name and purpose—critical for guiding the LLM.- Works with both standard LLMs (use
.to_string()) and chat models (use.content). - Best for simple structured data (no complex validation) compared to Pydantic/JSON parsers.
