为 LLM 代理构建 UI

Gradio Chatbot 可以原生地在聊天消息旁边的可折叠手风琴中显示中间思考过程和工具使用情况。这使得它非常适合为 LLM 代理和链式思考（CoT）或推理演示创建 UI。本指南将向您展示如何使用 gr.Chatbot 和 gr.ChatInterface 显示思考过程和工具使用情况。

`ChatMessage` 数据类

Gradio 聊天机器人中的每条消息都是 ChatMessage 类型的数据类（前提是聊天机器人的 type="message"，强烈建议如此设置）。ChatMessage 的架构如下：

@dataclass
class ChatMessage:
   content: str | Component
   role: Literal["user", "assistant"]
   metadata: MetadataDict = None
   options: list[OptionDict] = None

class MetadataDict(TypedDict):
   title: NotRequired[str]
   id: NotRequired[int | str]
   parent_id: NotRequired[int | str]
   log: NotRequired[str]
   duration: NotRequired[float]
   status: NotRequired[Literal["pending", "done"]]

class OptionDict(TypedDict):
   label: NotRequired[str]
   value: str

就我们的目的而言，最重要的键是 metadata 键，它接受一个字典。如果此字典包含消息的 title，它将显示在一个可折叠的手风琴中，代表一个思考过程。就这么简单！请看这个示例：

import gradio as gr

with gr.Blocks() as demo:
    chatbot = gr.Chatbot(
        type="messages",
        value=[
            gr.ChatMessage(
                role="user", 
                content="What is the weather in San Francisco?"
            ),
            gr.ChatMessage(
                role="assistant", 
                content="I need to use the weather API tool?",
                metadata={"title":  "🧠 Thinking"}
        ]
    )

demo.launch()

除了 title，提供给 metadata 的字典还可以包含几个可选键：

log：一个可选的字符串值，以柔和的字体显示在思考标题旁边。
duration：一个可选的数值，表示思考/工具使用的时间，以秒为单位。以柔和的字体显示在思考标题旁边的括号内。
status：如果设置为 "pending"，思考标题旁边会出现一个旋转器，手风琴会初始化为打开状态。如果 status 为 "done"，思考手风琴会初始化为关闭状态。如果未提供 status，思考手风琴会初始化为打开状态，并且不显示旋转器。
id 和 parent_id：如果提供这些，它们可以用于在其他思考中嵌套思考。

下面，我们展示了几个使用 gr.Chatbot 和 gr.ChatInterface 显示工具使用或思考 UI 的完整示例。

使用代理构建

使用 transformers.agents 的真实示例

我们将创建一个具有文本到图像工具访问权限的 Gradio 应用程序简单代理。

提示： 请务必先阅读 [smolagents 文档](https://hugging-face.cn/docs/smolagents/index)

我们将首先从 transformers 和 gradio 中导入必要的类。

import gradio as gr
from gradio import ChatMessage
from transformers import Tool, ReactCodeAgent  # type: ignore
from transformers.agents import stream_to_gradio, HfApiEngine  # type: ignore

# Import tool from Hub
image_generation_tool = Tool.from_space(
    space_id="black-forest-labs/FLUX.1-schnell",
    name="image_generator",
    description="Generates an image following your prompt. Returns a PIL Image.",
    api_name="/infer",
)

llm_engine = HfApiEngine("Qwen/Qwen2.5-Coder-32B-Instruct")
# Initialize the agent with both tools and engine
agent = ReactCodeAgent(tools=[image_generation_tool], llm_engine=llm_engine)

然后我们将构建 UI

def interact_with_agent(prompt, history):
    messages = []
    yield messages
    for msg in stream_to_gradio(agent, prompt):
        messages.append(asdict(msg))
        yield messages
    yield messages


demo = gr.ChatInterface(
    interact_with_agent,
    chatbot= gr.Chatbot(
        label="Agent",
        type="messages",
        avatar_images=(
            None,
            "https://em-content.zobj.net/source/twitter/53/robot-face_1f916.png",
        ),
    ),
    examples=[
        ["Generate an image of an astronaut riding an alligator"],
        ["I am writing a children's book for my daughter. Can you help me with some illustrations?"],
    ],
    type="messages",
)

您可以在此处查看完整的演示代码。

transformers_agent_code

使用 langchain 代理的真实示例

我们将为具有搜索引擎访问权限的 langchain 代理创建 UI。

我们将从导入和设置 langchain 代理开始。请注意，您需要一个 .env 文件，并设置以下环境变量：

SERPAPI_API_KEY=
HF_TOKEN=
OPENAI_API_KEY=

from langchain import hub
from langchain.agents import AgentExecutor, create_openai_tools_agent, load_tools
from langchain_openai import ChatOpenAI
from gradio import ChatMessage
import gradio as gr

from dotenv import load_dotenv

load_dotenv()

model = ChatOpenAI(temperature=0, streaming=True)

tools = load_tools(["serpapi"])

# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/openai-tools-agent")
agent = create_openai_tools_agent(
    model.with_config({"tags": ["agent_llm"]}), tools, prompt
)
agent_executor = AgentExecutor(agent=agent, tools=tools).with_config(
    {"run_name": "Agent"}
)

然后我们将创建 Gradio UI

async def interact_with_langchain_agent(prompt, messages):
    messages.append(ChatMessage(role="user", content=prompt))
    yield messages
    async for chunk in agent_executor.astream(
        {"input": prompt}
    ):
        if "steps" in chunk:
            for step in chunk["steps"]:
                messages.append(ChatMessage(role="assistant", content=step.action.log,
                                  metadata={"title": f"🛠️ Used tool {step.action.tool}"}))
                yield messages
        if "output" in chunk:
            messages.append(ChatMessage(role="assistant", content=chunk["output"]))
            yield messages


with gr.Blocks() as demo:
    gr.Markdown("# Chat with a LangChain Agent 🦜⛓️ and see its thoughts 💭")
    chatbot = gr.Chatbot(
        type="messages",
        label="Agent",
        avatar_images=(
            None,
            "https://em-content.zobj.net/source/twitter/141/parrot_1f99c.png",
        ),
    )
    input = gr.Textbox(lines=1, label="Chat Message")
    input.submit(interact_with_langchain_agent, [input_2, chatbot_2], [chatbot_2])

demo.launch()

langchain_agent_code

就是这样！在这里查看我们完成的 langchain 演示。

使用可见思考的 LLM 构建

Gradio Chatbot 可以原生显示“思考中” LLM 的中间思考过程。这使得它非常适合创建 UI，以显示 AI 模型在生成响应时如何“思考”。下面的指南将向您展示如何构建一个聊天机器人，实时显示 Gemini AI 的思考过程。

使用 Gemini 2.0 Flash Thinking API 的真实示例

让我们创建一个完整的聊天机器人，实时显示其思考和响应。我们将使用 Google 的 Gemini API 访问 Gemini 2.0 Flash Thinking LLM，并使用 Gradio 作为 UI。

我们将从导入和设置 Gemini 客户端开始。请注意，您需要先获取 Google Gemini API 密钥 —

import gradio as gr
from gradio import ChatMessage
from typing import Iterator
import google.generativeai as genai

genai.configure(api_key="your-gemini-api-key")
model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-1219")

首先，让我们设置处理模型输出的流式传输函数

def stream_gemini_response(user_message: str, messages: list) -> Iterator[list]:
    """
    Streams both thoughts and responses from the Gemini model.
    """
    # Initialize response from Gemini
    response = model.generate_content(user_message, stream=True)
    
    # Initialize buffers
    thought_buffer = ""
    response_buffer = ""
    thinking_complete = False
    
    # Add initial thinking message
    messages.append(
        ChatMessage(
            role="assistant",
            content="",
            metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
        )
    )
    
    for chunk in response:
        parts = chunk.candidates[0].content.parts
        current_chunk = parts[0].text
        
        if len(parts) == 2 and not thinking_complete:
            # Complete thought and start response
            thought_buffer += current_chunk
            messages[-1] = ChatMessage(
                role="assistant",
                content=thought_buffer,
                metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
            )
            
            # Add response message
            messages.append(
                ChatMessage(
                    role="assistant",
                    content=parts[1].text
                )
            )
            thinking_complete = True
            
        elif thinking_complete:
            # Continue streaming response
            response_buffer += current_chunk
            messages[-1] = ChatMessage(
                role="assistant",
                content=response_buffer
            )
            
        else:
            # Continue streaming thoughts
            thought_buffer += current_chunk
            messages[-1] = ChatMessage(
                role="assistant",
                content=thought_buffer,
                metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
            )
        
        yield messages

然后，让我们创建 Gradio 界面

with gr.Blocks() as demo:
    gr.Markdown("# Chat with Gemini 2.0 Flash and See its Thoughts 💭")
    
    chatbot = gr.Chatbot(
        type="messages",
        label="Gemini2.0 'Thinking' Chatbot",
        render_markdown=True,
    )
    
    input_box = gr.Textbox(
        lines=1,
        label="Chat Message",
        placeholder="Type your message here and press Enter..."
    )
    
    # Set up event handlers
    msg_store = gr.State("")  # Store for preserving user message
    
    input_box.submit(
        lambda msg: (msg, msg, ""),  # Store message and clear input
        inputs=[input_box],
        outputs=[msg_store, input_box, input_box],
        queue=False
    ).then(
        user_message,  # Add user message to chat
        inputs=[msg_store, chatbot],
        outputs=[input_box, chatbot],
        queue=False
    ).then(
        stream_gemini_response,  # Generate and stream response
        inputs=[msg_store, chatbot],
        outputs=chatbot
    )

demo.launch()

这将创建一个聊天机器人，它能：

在可折叠部分显示模型的思考过程
实时流式传输思考和最终响应
保持清晰的聊天历史记录

就是这样！您现在拥有一个不仅能响应用户，还能展示其思考过程的聊天机器人，创造了更透明和引人入胜的交互。在这里查看我们完成的 Gemini 2.0 Flash Thinking 演示。

使用引用构建

Gradio Chatbot 可以显示 LLM 响应中的引用，这使得它非常适合创建显示源文档和参考文献的 UI。本指南将向您展示如何构建一个实时显示 Claude 引用信息的聊天机器人。

使用 Anthropic's Citations API 的真实示例

让我们创建一个完整的聊天机器人，同时显示响应及其支持引用。我们将使用启用引用的 Anthropic Claude API 和 Gradio 作为 UI。

我们将从导入和设置 Anthropic 客户端开始。请注意，您需要设置 ANTHROPIC_API_KEY 环境变量

import gradio as gr
import anthropic
import base64
from typing import List, Dict, Any

client = anthropic.Anthropic()

首先，让我们设置处理文档准备的消息格式化函数

def encode_pdf_to_base64(file_obj) -> str:
    """Convert uploaded PDF file to base64 string."""
    if file_obj is None:
        return None
    with open(file_obj.name, 'rb') as f:
        return base64.b64encode(f.read()).decode('utf-8')

def format_message_history(
    history: list, 
    enable_citations: bool,
    doc_type: str,
    text_input: str,
    pdf_file: str
) -> List[Dict]:
    """Convert Gradio chat history to Anthropic message format."""
    formatted_messages = []
    
    # Add previous messages
    for msg in history[:-1]:
        if msg["role"] == "user":
            formatted_messages.append({"role": "user", "content": msg["content"]})
    
    # Prepare the latest message with document
    latest_message = {"role": "user", "content": []}
    
    if enable_citations:
        if doc_type == "plain_text":
            latest_message["content"].append({
                "type": "document",
                "source": {
                    "type": "text",
                    "media_type": "text/plain",
                    "data": text_input.strip()
                },
                "title": "Text Document",
                "citations": {"enabled": True}
            })
        elif doc_type == "pdf" and pdf_file:
            pdf_data = encode_pdf_to_base64(pdf_file)
            if pdf_data:
                latest_message["content"].append({
                    "type": "document",
                    "source": {
                        "type": "base64",
                        "media_type": "application/pdf",
                        "data": pdf_data
                    },
                    "title": pdf_file.name,
                    "citations": {"enabled": True}
                })
    
    # Add the user's question
    latest_message["content"].append({"type": "text", "text": history[-1]["content"]})
    
    formatted_messages.append(latest_message)
    return formatted_messages

然后，让我们创建处理引用的机器人响应处理程序

def bot_response(
    history: list,
    enable_citations: bool,
    doc_type: str,
    text_input: str,
    pdf_file: str
) -> List[Dict[str, Any]]:
    try:
        messages = format_message_history(history, enable_citations, doc_type, text_input, pdf_file)
        response = client.messages.create(model="claude-3-5-sonnet-20241022", max_tokens=1024, messages=messages)
        
        # Initialize main response and citations
        main_response = ""
        citations = []
        
        # Process each content block
        for block in response.content:
            if block.type == "text":
                main_response += block.text
                if enable_citations and hasattr(block, 'citations') and block.citations:
                    for citation in block.citations:
                        if citation.cited_text not in citations:
                            citations.append(citation.cited_text)
        
        # Add main response
        history.append({"role": "assistant", "content": main_response})
        
        # Add citations in a collapsible section
        if enable_citations and citations:
            history.append({
                "role": "assistant",
                "content": "\n".join([f"• {cite}" for cite in citations]),
                "metadata": {"title": "📚 Citations"}
            })
        
        return history
            
    except Exception as e:
        history.append({
            "role": "assistant",
            "content": "I apologize, but I encountered an error while processing your request."
        })
        return history

最后，让我们创建 Gradio 界面

with gr.Blocks() as demo:
    gr.Markdown("# Chat with Citations")
    
    with gr.Row(scale=1):
        with gr.Column(scale=4):
            chatbot = gr.Chatbot(type="messages", bubble_full_width=False, show_label=False, scale=1)
            msg = gr.Textbox(placeholder="Enter your message here...", show_label=False, container=False)
            
        with gr.Column(scale=1):
            enable_citations = gr.Checkbox(label="Enable Citations", value=True, info="Toggle citation functionality" )
            doc_type_radio = gr.Radio( choices=["plain_text", "pdf"], value="plain_text", label="Document Type", info="Choose the type of document to use")
            text_input = gr.Textbox(label="Document Content", lines=10, info="Enter the text you want to reference")
            pdf_input = gr.File(label="Upload PDF", file_types=[".pdf"], file_count="single", visible=False)
    
    # Handle message submission
    msg.submit(
        user_message,
        [msg, chatbot, enable_citations, doc_type_radio, text_input, pdf_input],
        [msg, chatbot]
    ).then(
        bot_response,
        [chatbot, enable_citations, doc_type_radio, text_input, pdf_input],
        chatbot
    )

demo.launch()

这将创建一个聊天机器人，它能：

支持纯文本和 PDF 文档供 Claude 引用
使用我们的 metadata 功能在可折叠部分显示引用
直接显示给定文档中的源引文

引用功能与 Gradio Chatbot 的 metadata 支持配合得特别好，这使我们能够创建可折叠部分，在保持聊天界面整洁的同时，仍然可以轻松访问源文档。

就是这样！您现在拥有一个不仅能响应用户，还能显示其来源的聊天机器人，创造了更透明和值得信赖的交互。在这里查看我们完成的引用演示。

为 LLM 代理构建 UI

ChatMessage 数据类

使用代理构建

使用 transformers.agents 的真实示例

使用 langchain 代理的真实示例

使用可见思考的 LLM 构建

使用 Gemini 2.0 Flash Thinking API 的真实示例

使用引用构建

使用 Anthropic's Citations API 的真实示例

`ChatMessage` 数据类