Gradio 聊天机器人可以原生显示中间思考过程和工具使用情况,这些信息会显示在聊天消息旁边的可折叠手风琴(accordion)中。这使其非常适合为 LLM 代理和思维链(CoT)或推理演示创建 UI。本指南将向您展示如何使用 gr.Chatbot 和 gr.ChatInterface 来显示思考过程和工具使用。

ChatMessage 数据类聊天机器人值的每个元素都是一个包含 role 和 content 键的字典。您可以使用普通的 Python 字典向聊天机器人添加新值,但 Gradio 也提供了 ChatMessage 数据类来帮助您进行 IDE 自动补全。ChatMessage 的架构如下所示:
MessageContent = Union[str, FileDataDict, FileData, Component]
@dataclass
class ChatMessage:
content: MessageContent | [MessageContent]
role: Literal["user", "assistant"]
metadata: MetadataDict = None
options: list[OptionDict] = None
class MetadataDict(TypedDict):
title: NotRequired[str]
id: NotRequired[int | str]
parent_id: NotRequired[int | str]
log: NotRequired[str]
duration: NotRequired[float]
status: NotRequired[Literal["pending", "done"]]
class OptionDict(TypedDict):
label: NotRequired[str]
value: str对我们来说,最重要的键是 metadata 键,它接受一个字典。如果此字典包含消息的 title,它将显示在表示思考过程的可折叠手风琴中。就这么简单!请看这个示例:
import gradio as gr
with gr.Blocks() as demo:
chatbot = gr.Chatbot(
value=[
gr.ChatMessage(
role="user",
content="What is the weather in San Francisco?"
),
gr.ChatMessage(
role="assistant",
content="I need to use the weather API tool?",
metadata={"title": "🧠 Thinking"}
)
]
)
demo.launch()除了 title,提供给 metadata 的字典还可以包含几个可选的键:
log:一个可选的字符串值,以柔和的字体显示在思考标题旁边。duration:一个可选的数字值,表示思考/工具使用持续时间(以秒为单位)。以柔和的字体显示在思考标题旁边的括号内。status:如果设置为 "pending",思考标题旁边会出现一个加载指示器,并且手风琴会默认打开。如果 status 是 "done",思考手风琴会默认关闭。如果未提供 status,思考手风琴会默认打开,并且不显示加载指示器。id 和 parent_id:如果提供了这些键,它们可用于将思考嵌套在其他思考中。下面,我们将展示几个使用 gr.Chatbot 和 gr.ChatInterface 来显示工具使用或思考过程 UI 的完整示例。
我们将创建一个 Gradio 应用程序,它使用一个简单的代理,该代理可以访问文本转图像工具。
请务必先阅读 smolagents 文档
我们将从 transformers 和 gradio 中导入必要的类开始。
import gradio as gr
from gradio import ChatMessage
from transformers import Tool, ReactCodeAgent # type: ignore
from transformers.agents import stream_to_gradio, HfApiEngine # type: ignore
# Import tool from Hub
image_generation_tool = Tool.from_space(
space_id="black-forest-labs/FLUX.1-schnell",
name="image_generator",
description="Generates an image following your prompt. Returns a PIL Image.",
api_name="/infer",
)
llm_engine = HfApiEngine("Qwen/Qwen2.5-Coder-32B-Instruct")
# Initialize the agent with both tools and engine
agent = ReactCodeAgent(tools=[image_generation_tool], llm_engine=llm_engine)然后我们将构建 UI。
def interact_with_agent(prompt, history):
messages = []
yield messages
for msg in stream_to_gradio(agent, prompt):
messages.append(asdict(msg))
yield messages
yield messages
demo = gr.ChatInterface(
interact_with_agent,
chatbot= gr.Chatbot(
label="Agent",
avatar_images=(
None,
"https://em-content.zobj.net/source/twitter/53/robot-face_1f916.png",
),
),
examples=[
["Generate an image of an astronaut riding an alligator"],
["I am writing a children's book for my daughter. Can you help me with some illustrations?"],
],
)您可以在这里查看完整的演示代码。
我们将为 langchain 代理创建一个 UI,该代理可以访问搜索引擎。
我们将从导入和设置 langchain 代理开始。请注意,您需要一个 .env 文件并设置以下环境变量 -
SERPAPI_API_KEY=
HF_TOKEN=
OPENAI_API_KEY=from langchain import hub
from langchain.agents import AgentExecutor, create_openai_tools_agent, load_tools
from langchain_openai import ChatOpenAI
from gradio import ChatMessage
import gradio as gr
from dotenv import load_dotenv
load_dotenv()
model = ChatOpenAI(temperature=0, streaming=True)
tools = load_tools(["serpapi"])
# Get the prompt to use - you can modify this!
prompt = hub.pull("hwchase17/openai-tools-agent")
agent = create_openai_tools_agent(
model.with_config({"tags": ["agent_llm"]}), tools, prompt
)
agent_executor = AgentExecutor(agent=agent, tools=tools).with_config(
{"run_name": "Agent"}
)然后我们将创建 Gradio UI
async def interact_with_langchain_agent(prompt, messages):
messages.append(ChatMessage(role="user", content=prompt))
yield messages
async for chunk in agent_executor.astream(
{"input": prompt}
):
if "steps" in chunk:
for step in chunk["steps"]:
messages.append(ChatMessage(role="assistant", content=step.action.log,
metadata={"title": f"🛠️ Used tool {step.action.tool}"}))
yield messages
if "output" in chunk:
messages.append(ChatMessage(role="assistant", content=chunk["output"]))
yield messages
with gr.Blocks() as demo:
gr.Markdown("# Chat with a LangChain Agent 🦜⛓️ and see its thoughts 💭")
chatbot = gr.Chatbot(
label="Agent",
avatar_images=(
None,
"https://em-content.zobj.net/source/twitter/141/parrot_1f99c.png",
),
)
input = gr.Textbox(lines=1, label="Chat Message")
input.submit(interact_with_langchain_agent, [input_2, chatbot_2], [chatbot_2])
demo.launch()就是这样!在这里查看我们完成的 langchain 演示。
Gradio 聊天机器人可以原生显示“思考中”LLM 的中间思考过程。这使其非常适合创建 UI,以显示 AI 模型在生成响应时的“思考”过程。下面的指南将向您展示如何构建一个实时显示 Gemini AI 思考过程的聊天机器人。
让我们创建一个完整的聊天机器人,实时显示其思考过程和响应。我们将使用 Google 的 Gemini API 访问 Gemini 2.0 Flash Thinking LLM,并使用 Gradio 作为 UI。
我们将从导入和设置 Gemini 客户端开始。请注意,您需要先获取 Google Gemini API 密钥 -
import gradio as gr
from gradio import ChatMessage
from typing import Iterator
import google.generativeai as genai
genai.configure(api_key="your-gemini-api-key")
model = genai.GenerativeModel("gemini-2.0-flash-thinking-exp-1219")首先,让我们设置处理模型输出的流式传输函数。
def stream_gemini_response(user_message: str, messages: list) -> Iterator[list]:
"""
Streams both thoughts and responses from the Gemini model.
"""
# Initialize response from Gemini
response = model.generate_content(user_message, stream=True)
# Initialize buffers
thought_buffer = ""
response_buffer = ""
thinking_complete = False
# Add initial thinking message
messages.append(
ChatMessage(
role="assistant",
content="",
metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
)
)
for chunk in response:
parts = chunk.candidates[0].content.parts
current_chunk = parts[0].text
if len(parts) == 2 and not thinking_complete:
# Complete thought and start response
thought_buffer += current_chunk
messages[-1] = ChatMessage(
role="assistant",
content=thought_buffer,
metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
)
# Add response message
messages.append(
ChatMessage(
role="assistant",
content=parts[1].text
)
)
thinking_complete = True
elif thinking_complete:
# Continue streaming response
response_buffer += current_chunk
messages[-1] = ChatMessage(
role="assistant",
content=response_buffer
)
else:
# Continue streaming thoughts
thought_buffer += current_chunk
messages[-1] = ChatMessage(
role="assistant",
content=thought_buffer,
metadata={"title": "⏳Thinking: *The thoughts produced by the Gemini2.0 Flash model are experimental"}
)
yield messages然后,让我们创建 Gradio 界面。
with gr.Blocks() as demo:
gr.Markdown("# Chat with Gemini 2.0 Flash and See its Thoughts 💭")
chatbot = gr.Chatbot(
label="Gemini2.0 'Thinking' Chatbot",
render_markdown=True,
)
input_box = gr.Textbox(
lines=1,
label="Chat Message",
placeholder="Type your message here and press Enter..."
)
# Set up event handlers
msg_store = gr.State("") # Store for preserving user message
input_box.submit(
lambda msg: (msg, msg, ""), # Store message and clear input
inputs=[input_box],
outputs=[msg_store, input_box, input_box],
queue=False
).then(
user_message, # Add user message to chat
inputs=[msg_store, chatbot],
outputs=[input_box, chatbot],
queue=False
).then(
stream_gemini_response, # Generate and stream response
inputs=[msg_store, chatbot],
outputs=chatbot
)
demo.launch()这将创建一个聊天机器人,它具备以下功能:
在可折叠部分显示模型的思考过程
实时流式传输思考过程和最终响应
维护清晰的聊天历史记录
就是这样!您现在拥有一个不仅能响应用户,还能展示其思考过程的聊天机器人,从而创建更透明、更具吸引力的互动。在这里查看我们完成的 Gemini 2.0 Flash Thinking 演示。
Gradio 聊天机器人可以显示来自 LLM 响应的引文,这使其非常适合创建显示源文档和引用的 UI。本指南将向您展示如何构建一个实时显示 Claude 引文的聊天机器人。
让我们创建一个完整的聊天机器人,显示响应及其支持引文。我们将使用启用引文的 Anthropic Claude API 和 Gradio 作为 UI。
我们将从导入和设置 Anthropic 客户端开始。请注意,您需要设置 ANTHROPIC_API_KEY 环境变量。
import gradio as gr
import anthropic
import base64
from typing import List, Dict, Any
client = anthropic.Anthropic()首先,让我们设置处理文档准备的消息格式化函数。
def encode_pdf_to_base64(file_obj) -> str:
"""Convert uploaded PDF file to base64 string."""
if file_obj is None:
return None
with open(file_obj.name, 'rb') as f:
return base64.b64encode(f.read()).decode('utf-8')
def format_message_history(
history: list,
enable_citations: bool,
doc_type: str,
text_input: str,
pdf_file: str
) -> List[Dict]:
"""Convert Gradio chat history to Anthropic message format."""
formatted_messages = []
# Add previous messages
for msg in history[:-1]:
if msg["role"] == "user":
formatted_messages.append({"role": "user", "content": msg["content"]})
# Prepare the latest message with document
latest_message = {"role": "user", "content": []}
if enable_citations:
if doc_type == "plain_text":
latest_message["content"].append({
"type": "document",
"source": {
"type": "text",
"media_type": "text/plain",
"data": text_input.strip()
},
"title": "Text Document",
"citations": {"enabled": True}
})
elif doc_type == "pdf" and pdf_file:
pdf_data = encode_pdf_to_base64(pdf_file)
if pdf_data:
latest_message["content"].append({
"type": "document",
"source": {
"type": "base64",
"media_type": "application/pdf",
"data": pdf_data
},
"title": pdf_file.name,
"citations": {"enabled": True}
})
# Add the user's question
latest_message["content"].append({"type": "text", "text": history[-1]["content"]})
formatted_messages.append(latest_message)
return formatted_messages然后,让我们创建处理引文的机器人响应处理程序。
def bot_response(
history: list,
enable_citations: bool,
doc_type: str,
text_input: str,
pdf_file: str
) -> List[Dict[str, Any]]:
try:
messages = format_message_history(history, enable_citations, doc_type, text_input, pdf_file)
response = client.messages.create(model="claude-3-5-sonnet-20241022", max_tokens=1024, messages=messages)
# Initialize main response and citations
main_response = ""
citations = []
# Process each content block
for block in response.content:
if block.type == "text":
main_response += block.text
if enable_citations and hasattr(block, 'citations') and block.citations:
for citation in block.citations:
if citation.cited_text not in citations:
citations.append(citation.cited_text)
# Add main response
history.append({"role": "assistant", "content": main_response})
# Add citations in a collapsible section
if enable_citations and citations:
history.append({
"role": "assistant",
"content": "\n".join([f"• {cite}" for cite in citations]),
"metadata": {"title": "📚 Citations"}
})
return history
except Exception as e:
history.append({
"role": "assistant",
"content": "I apologize, but I encountered an error while processing your request."
})
return history最后,让我们创建 Gradio 界面。
with gr.Blocks() as demo:
gr.Markdown("# Chat with Citations")
with gr.Row(scale=1):
with gr.Column(scale=4):
chatbot = gr.Chatbot(bubble_full_width=False, show_label=False, scale=1)
msg = gr.Textbox(placeholder="Enter your message here...", show_label=False, container=False)
with gr.Column(scale=1):
enable_citations = gr.Checkbox(label="Enable Citations", value=True, info="Toggle citation functionality" )
doc_type_radio = gr.Radio( choices=["plain_text", "pdf"], value="plain_text", label="Document Type", info="Choose the type of document to use")
text_input = gr.Textbox(label="Document Content", lines=10, info="Enter the text you want to reference")
pdf_input = gr.File(label="Upload PDF", file_types=[".pdf"], file_count="single", visible=False)
# Handle message submission
msg.submit(
user_message,
[msg, chatbot, enable_citations, doc_type_radio, text_input, pdf_input],
[msg, chatbot]
).then(
bot_response,
[chatbot, enable_citations, doc_type_radio, text_input, pdf_input],
chatbot
)
demo.launch()这将创建一个聊天机器人,它具备以下功能:
metadata 功能在可折叠部分显示引文引文功能与 Gradio Chatbot 的 metadata 支持配合得特别好,使我们能够创建可折叠的部分,保持聊天界面整洁,同时仍可轻松访问源文档。
就是这样!您现在拥有一个不仅能响应用户,还能显示其来源的聊天机器人,从而创建更透明、更值得信赖的互动。在这里查看我们完成的 Citations 演示。