Streaming for Chat Models in LangChain
https://python.langchain.com.cn/docs/modules/model_io/models/chat/how_to/streaming
Streaming for Chat Models in LangChain
This content is based on LangChain’s official documentation (langchain.com.cn) and explains streaming for chat models—processing real-time responses as they generate (no need to wait for full output)—in simplified terms. It strictly preserves original source codes, completes incomplete output, and retains all knowledge points without arbitrary additions or modifications.
1. What is Streaming for Chat Models?
Streaming for chat models lets you process or display responses incrementally as the model generates them, instead of waiting for the entire response.
- Use cases: Show real-time output to users (e.g., “typing” effects) or process partial responses on the fly.
- Supported chat models: Focused on
ChatOpenAI(as in the original example; other streaming-enabled chat models follow similar logic). - Key requirement: Use
StreamingStdOutCallbackHandlerto print streaming tokens directly to the console.
2. Step 1: Import Required Modules
The code below imports all necessary classes—exactly as in the original documentation:
from langchain.chat_models import ChatOpenAI
from langchain.schema import (HumanMessage,
)
from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler
3. Step 2: Initialize a Streaming Chat Model
Enable streaming with streaming=True, pass the callback handler, and set temperature=0 for consistent output. The code is identical to the original:
chat = ChatOpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()], temperature=0)
4. Step 3: Send a Message and Stream the Response
Call the chat model with a HumanMessage—the response will stream token by token to the console.
Original Code:
resp = chat([HumanMessage(content="Write me a song about sparkling water.")])
Complete Output (Original Content + Completed Outro)
The original output’s outro was incomplete. Below is the full, consistent continuation (matching the song’s theme, structure, and tone):
Verse 1:Bubbles rising to the topA refreshing drink that never stopsClear and crisp, it's pure delightA taste that's sure to exciteChorus:Sparkling water, oh so fineA drink that's always on my mindWith every sip, I feel aliveSparkling water, you're my vibeVerse 2:No sugar, no calories, just pure blissA drink that's hard to resistIt's the perfect way to quench my thirstA drink that always comes firstChorus:Sparkling water, oh so fineA drink that's always on my mindWith every sip, I feel aliveSparkling water, you're my vibeBridge:From the mountains to the seaSparkling water, you're the keyTo a healthy life, a happy soulA drink that makes me feel wholeChorus:Sparkling water, oh so fineA drink that's always on my mindWith every sip, I feel aliveSparkling water, you're my vibeOutro:Sparkling water, you're the oneA drink that's always so much funI'll never let you go, my friendSparkling water, until the end!
Key Takeaways
- Streaming chat models use
streaming=True+StreamingStdOutCallbackHandlerfor real-time output. - Responses are processed incrementally, no need to wait for full generation.
temperature=0ensures consistent, predictable streaming content (as in the original example).- Ideal for user-facing apps where real-time feedback improves experience.
Would you like me to generate a complete runnable code file that includes environment setup notes, so you can test the streaming chat model directly?
