结构化输出 | LangChain 1.0 Python 知识文档

📚 什么是结构化输出？

结构化输出使 Agent 能够返回可预测的、机器可读的数据格式，而不是需要解析的自然语言文本。你可以获得 JSON 对象、Pydantic 模型或数据类，应用程序可以直接使用。

💡 核心优势

优势	说明	示例场景
类型安全	编译时类型检查，IDE 自动补全	数据库插入、API 调用
自动验证	Pydantic 验证字段约束	年龄 18-120、邮箱格式
消除歧义	无需复杂的文本解析逻辑	提取发票数据
跨提供商	OpenAI、Anthropic、Google 统一接口	多模型兼容
错误处理	自动重试验证失败	格式错误自动修正

Python 🟢 基础

"""
结构化输出基础示例
"""
from pydantic import BaseModel
from langchain.agents import create_agent

# 定义输出结构
class ContactInfo(BaseModel):
    """联系人信息"""
    name: str
    email: str
    phone: str

# 创建 Agent（使用 response_format 指定输出格式）
agent = create_agent(
    model="gpt-4o",
    response_format=ContactInfo  # 自动选择最佳策略
)

# 调用 Agent
result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "提取联系人信息：张三，邮箱 [email protected]，电话 13812345678"
    }]
})

# 获取结构化数据
contact = result["structured_response"]
print(contact.name)   # 张三
print(contact.email)  # [email protected]
print(contact.phone)  # 13812345678

🎯 三种响应格式策略

LangChain 提供三种策略来实现结构化输出，自动适配不同的模型能力：

graph TB User["用户请求"] --> Agent["create_agent()"] Agent --> Strategy{"response_format
策略选择"} Strategy -->|原生支持| Provider["ProviderStrategy
使用模型原生 API"] Strategy -->|工具调用| Tool["ToolStrategy
基于工具调用"] Strategy -->|自动| Auto["Auto-Selection
自动选择最佳策略"] Provider --> OpenAI["OpenAI
Anthropic
Grok"] Tool --> Universal["通用方案
适用所有模型"] Auto --> Smart["智能选择
推荐方式"] OpenAI --> Output["结构化输出
Pydantic 模型"] Universal --> Output Smart --> Output style Provider fill:#10b981,color:#fff style Tool fill:#f59e0b,color:#fff style Auto fill:#3b82f6,color:#fff style Output fill:#8b5cf6,color:#fff

1. ProviderStrategy（原生策略）

使用模型提供商的原生 API（OpenAI、Anthropic、Grok），速度更快、更可靠。

Python 🟡 中级

"""
ProviderStrategy - 使用原生 API
"""
from langchain.agents.structured_output import ProviderStrategy
from pydantic import BaseModel

class ContactInfo(BaseModel):
    name: str
    email: str

# 显式使用 ProviderStrategy
response_format = ProviderStrategy(schema=ContactInfo)

agent = create_agent(
    model="gpt-4o",  # 支持原生结构化输出
    response_format=response_format
)

2. ToolStrategy（工具策略）

利用工具调用机制实现结构化输出，适用于没有原生支持的模型。

Python 🟡 中级

"""
ToolStrategy - 基于工具调用
"""
from langchain.agents.structured_output import ToolStrategy

# 使用 ToolStrategy
response_format = ToolStrategy(schema=ContactInfo)

agent = create_agent(
    model="any-model",  # 任何支持工具调用的模型
    response_format=response_format
)

3. Auto-Selection（自动选择，推荐）

直接传入 Schema，LangChain 自动选择最佳策略。

Python 🟢 基础 - 推荐

"""
Auto-Selection - 推荐方式
"""
# ✅ 最简单的方式：直接传入 Pydantic 模型
agent = create_agent(
    model="gpt-4o",
    response_format=ContactInfo  # 自动选择最佳策略
)

# LangChain 自动判断：
# - GPT-4o 支持原生 → 使用 ProviderStrategy
# - 其他模型 → 使用 ToolStrategy

策略	适用场景	性能	兼容性
`ProviderStrategy`	OpenAI, Anthropic, Grok	⚡ 快速	🟡 有限
`ToolStrategy`	所有支持工具调用的模型	🟡 中等	✅ 广泛
Auto-Selection	推荐方式	⚡ 自适应	✅ 最佳

📝 Schema 定义方法

LangChain 支持三种 Schema 定义方式，推荐使用 Pydantic：

1. Pydantic Models（推荐）

Pydantic 提供强大的验证、文档和 IDE 支持。

Python 🟡 中级

"""
Pydantic Models - 完整示例
"""
from pydantic import BaseModel, Field
from typing import Literal

class ProductReview(BaseModel):
    """产品评论分析"""

    # 带约束的字段
    rating: int | None = Field(
        description="评分 1-5 星",
        ge=1,  # 大于等于 1
        le=5   # 小于等于 5
    )

    # 使用 Literal 限制选项
    sentiment: Literal["positive", "negative", "neutral"] = Field(
        description="情感倾向"
    )

    # 列表类型
    key_points: list[str] = Field(
        description="关键要点列表"
    )

    # 可选字段
    would_recommend: bool | None = Field(
        default=None,
        description="是否推荐购买"
    )

# 使用
agent = create_agent(
    model="gpt-4o",
    response_format=ProductReview
)

result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "分析评论：这个产品很棒，5星推荐！质量好，价格实惠。"
    }]
})

review = result["structured_response"]
print(f"评分：{review.rating}")
print(f"情感：{review.sentiment}")
print(f"要点：{review.key_points}")

2. Dataclasses

Python 🟢 基础

"""
Dataclasses - 简单场景
"""
from dataclasses import dataclass

@dataclass
class EventDetails:
    event_name: str
    date: str
    location: str

agent = create_agent(
    model="gpt-4o",
    response_format=EventDetails
)

3. TypedDict

Python 🟢 基础

"""
TypedDict - 字典类型
"""
from typing import TypedDict, Literal

class MeetingAction(TypedDict):
    task: str
    assignee: str
    priority: Literal["low", "medium", "high"]

agent = create_agent(
    model="gpt-4o",
    response_format=MeetingAction
)

🔀 Union 类型 - 多种输出格式

使用 Union 类型让 Agent 根据输入自动选择合适的输出格式：

Python 🔴 高级

"""
Union 类型 - 智能路由
"""
from typing import Union
from pydantic import BaseModel
from langchain.agents.structured_output import ToolStrategy

class SendEmailRequest(BaseModel):
    """发送邮件请求"""
    recipient: str
    subject: str
    body: str

class ScheduleMeetingRequest(BaseModel):
    """安排会议请求"""
    attendees: list[str]
    title: str
    duration_minutes: int

# 使用 Union 类型
response_format = ToolStrategy(
    schema=Union[SendEmailRequest, ScheduleMeetingRequest]
)

agent = create_agent(
    model="gpt-4o",
    response_format=response_format
)

# 示例 1：发送邮件
result1 = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "发邮件给 [email protected]，主题是'项目更新'"
    }]
})

# Agent 自动选择 SendEmailRequest
email_request = result1["structured_response"]
print(type(email_request))  # SendEmailRequest

# 示例 2：安排会议
result2 = agent.invoke({
    "messages": [{
        "role": "user",
        "content": "安排一个 30 分钟的会议，邀请 Alice 和 Bob"
    }]
})

# Agent 自动选择 ScheduleMeetingRequest
meeting_request = result2["structured_response"]
print(type(meeting_request))  # ScheduleMeetingRequest

⚠️ 多输出自动修正

当模型错误返回多个结构化输出时，Agent 会自动：

检测到 MultipleStructuredOutputsError
提供错误反馈给模型
自动重试，要求返回单一输出

🛡️ 错误处理和验证

LangChain 提供灵活的错误处理机制，确保结构化输出的可靠性：

1. 默认错误处理（推荐）

Python 🟢 基础

"""
默认错误处理 - 捕获所有验证错误
"""
from langchain.agents.structured_output import ToolStrategy
from pydantic import BaseModel, Field

class ProductRating(BaseModel):
    rating: int = Field(ge=1, le=5)  # 必须在 1-5 之间
    comment: str

response_format = ToolStrategy(
    schema=ProductRating,
    handle_errors=True  # 默认值，捕获所有错误并自动重试
)

agent = create_agent(
    model="gpt-4o",
    response_format=response_format
)

# 当模型返回 rating=6 时，自动重试
result = agent.invoke({
    "messages": [{"role": "user", "content": "评分这个产品"}]
})

2. 自定义错误消息

Python 🟡 中级

"""
自定义错误消息
"""
response_format = ToolStrategy(
    schema=ProductRating,
    handle_errors="评分必须在 1-5 之间，并且必须提供评论。请重新生成。"
)

# 验证失败时，模型会收到这条自定义消息

3. 特定异常处理

Python 🟡 中级

"""
只处理特定异常
"""
response_format = ToolStrategy(
    schema=ProductRating,
    handle_errors=(ValueError, TypeError)  # 只重试这些异常
)

# 其他异常会传播到调用方

4. 自定义错误处理函数

Python 🔴 高级

"""
自定义错误处理函数
"""
from langchain.agents.structured_output import (
    StructuredOutputValidationError,
    MultipleStructuredOutputsError
)

def custom_error_handler(error: Exception) -> str:
    """自定义错误处理逻辑"""

    if isinstance(error, StructuredOutputValidationError):
        # 字段验证失败
        return f"格式错误：{error}。请按照正确的格式重新生成。"

    elif isinstance(error, MultipleStructuredOutputsError):
        # 返回了多个输出
        return "你返回了多个输出，但只需要一个。请选择最相关的输出。"

    else:
        # 其他错误
        return f"发生错误：{str(error)}。请重试。"

response_format = ToolStrategy(
    schema=Union[SendEmailRequest, ScheduleMeetingRequest],
    handle_errors=custom_error_handler
)

agent = create_agent(
    model="gpt-4o",
    response_format=response_format
)

5. 禁用错误处理（快速失败）

Python

"""
禁用错误处理 - 所有异常直接抛出
"""
response_format = ToolStrategy(
    schema=ProductRating,
    handle_errors=False  # 不自动重试
)

# 验证失败时立即抛出异常

🚀 实战用例

用例 1：发票数据提取

Python 🔴 高级 - 完整示例

"""
实战用例 1：发票数据提取
"""
from pydantic import BaseModel, Field
from langchain.agents import create_agent

class InvoiceData(BaseModel):
    """发票数据结构"""
    invoice_id: str = Field(description="发票编号")
    amount: float = Field(description="总金额", gt=0)
    vendor: str = Field(description="供应商名称")
    date: str = Field(description="日期，格式 YYYY-MM-DD")
    items: list[str] = Field(description="商品列表")

agent = create_agent(
    model="gpt-4o",
    response_format=InvoiceData
)

# 从文本中提取发票信息
invoice_text = """
发票编号：INV-2024-001
日期：2024-01-15
供应商：ABC 公司
商品：笔记本电脑 x1、鼠标 x2、键盘 x1
总金额：5,800 元
"""

result = agent.invoke({
    "messages": [{
        "role": "user",
        "content": f"提取发票信息：\n{invoice_text}"
    }]
})

invoice = result["structured_response"]
print(f"发票号：{invoice.invoice_id}")
print(f"金额：{invoice.amount}")
print(f"供应商：{invoice.vendor}")
print(f"商品：{invoice.items}")

# 直接插入数据库
# db.invoices.insert(invoice.model_dump())

用例 2：表单自动填充

Python 🟡 中级

"""
实战用例 2：用户注册表单自动填充
"""
from pydantic import BaseModel, Field, EmailStr

class UserProfile(BaseModel):
    """用户资料"""
    first_name: str = Field(description="名")
    last_name: str = Field(description="姓")
    email: EmailStr = Field(description="有效的邮箱地址")
    age: int = Field(description="年龄", ge=18, le=120)
    preferences: list[str] = Field(description="兴趣爱好")

agent = create_agent(
    model="gpt-4o",
    response_format=UserProfile
)

# 从对话中提取用户信息
result = agent.invoke({
    "messages": [
        {"role": "user", "content": "我叫李华"},
        {"role": "assistant", "content": "你好李华！"},
        {"role": "user", "content": "我 25 岁，邮箱是 [email protected]"},
        {"role": "user", "content": "我喜欢编程、阅读和旅游"}
    ]
})

profile = result["structured_response"]
# 自动填充表单
form_data = profile.model_dump()
print(form_data)
# {
#     "first_name": "华",
#     "last_name": "李",
#     "email": "[email protected]",
#     "age": 25,
#     "preferences": ["编程", "阅读", "旅游"]
# }

用例 3：智能 API 路由

Python 🔴 高级

"""
实战用例 3：基于意图的 API 路由
"""
from typing import Union
from pydantic import BaseModel

class SearchQuery(BaseModel):
    """搜索请求"""
    query: str
    filters: dict = {}

class CreateOrder(BaseModel):
    """创建订单"""
    product_id: str
    quantity: int

class CustomerSupport(BaseModel):
    """客服咨询"""
    category: str
    message: str

# Union 类型自动路由
agent = create_agent(
    model="gpt-4o",
    response_format=Union[SearchQuery, CreateOrder, CustomerSupport]
)

# 根据用户意图自动选择合适的 API
def handle_user_request(user_input: str):
    result = agent.invoke({
        "messages": [{"role": "user", "content": user_input}]
    })

    request = result["structured_response"]

    # 根据类型调用不同的 API
    if isinstance(request, SearchQuery):
        return search_api(request.query, request.filters)
    elif isinstance(request, CreateOrder):
        return order_api(request.product_id, request.quantity)
    elif isinstance(request, CustomerSupport):
        return support_api(request.category, request.message)

# 示例
handle_user_request("搜索蓝色的衬衫")  # → SearchQuery
handle_user_request("购买 2 件 产品ID-123")  # → CreateOrder
handle_user_request("我的订单还没到")  # → CustomerSupport

✨ 最佳实践

1. Schema 设计

使用描述性的字段文档：通过 Field(description="...") 帮助模型理解字段含义
添加约束：使用 ge、le、min_length 等约束
使用 Literal 类型：限制枚举值，如 Literal["low", "medium", "high"]
设置可选字段：使用 | None 和 default=None 提供灵活性

2. 错误处理

默认 handle_errors=True 适用于大多数场景
自定义错误消息用于领域特定的错误提示
禁用重试仅在需要快速失败时使用

3. 性能优化

优先使用原生策略：ProviderStrategy 速度更快、更可靠
工具策略适用于通用场景，但需要重试逻辑
Union 类型用于复杂分类任务

4. OpenAI Strict Mode（langchain>=1.2）

Python

"""
OpenAI Strict Mode - 更严格的 Schema 约束
"""
response_format = ProviderStrategy(
    schema=ProductReview,
    strict=True  # 强制严格遵守 Schema
)

agent = create_agent(
    model="gpt-4o",
    response_format=response_format
)

❓ 常见问题

Q1: 结构化输出支持哪些模型？

提供商	原生支持（ProviderStrategy）	工具支持（ToolStrategy）
OpenAI	✅ GPT-4, GPT-3.5	✅ 所有模型
Anthropic	✅ Claude 3+	✅ 所有模型
Google	❌	✅ Gemini
Grok	✅ Grok	✅ 所有模型

Q2: 如何调试结构化输出？

Python

# 1. 查看完整的消息历史
result = agent.invoke({...})
for msg in result["messages"]:
    print(msg)

# 2. 启用 LangSmith 追踪
import os
os.environ["LANGCHAIN_TRACING_V2"] = "true"

Q3: Union 类型的最佳实践？

Union 类型适用于意图分类场景，但不要滥用：

✅ 好：2-3 种明确区分的类型（如 Email vs Meeting）
❌ 坏：5+ 种类型（容易混淆，建议拆分）

Q4: 结构化输出会增加成本吗？

ProviderStrategy：几乎无额外成本
ToolStrategy：需要额外的工具调用轮次，可能增加 10-20% 成本

Q5: 如何处理嵌套的结构化输出？

Python

class Address(BaseModel):
    street: str
    city: str
    country: str

class Person(BaseModel):
    name: str
    addresses: list[Address]  # 嵌套模型

agent = create_agent(
    model="gpt-4o",
    response_format=Person
)

📊 LangChain 结构化输出

📚 什么是结构化输出？

🎯 三种响应格式策略

1. ProviderStrategy（原生策略）

2. ToolStrategy（工具策略）

3. Auto-Selection（自动选择，推荐）

📝 Schema 定义方法

1. Pydantic Models（推荐）

2. Dataclasses

3. TypedDict

🔀 Union 类型 - 多种输出格式

🛡️ 错误处理和验证

1. 默认错误处理（推荐）

2. 自定义错误消息

3. 特定异常处理

4. 自定义错误处理函数

5. 禁用错误处理（快速失败）

🚀 实战用例

用例 1：发票数据提取

用例 2：表单自动填充

用例 3：智能 API 路由

✨ 最佳实践

1. Schema 设计

2. 错误处理

3. 性能优化

4. OpenAI Strict Mode（langchain>=1.2）

❓ 常见问题

Q1: 结构化输出支持哪些模型？

Q2: 如何调试结构化输出？

Q3: Union 类型的最佳实践？

Q4: 结构化输出会增加成本吗？

Q5: 如何处理嵌套的结构化输出？

📖 参考资源