[ blog · tutorial ]9 min read

用 Search + URL Extract 打造研究 Agent（Claude tool use，端到端實作）

Q: Agent 怎麼決定何時停止？

當 Claude 有足夠的上下文可以回答時，會回傳 stop_reason: end_turn；想呼叫工具時則回 tool_use。迴圈就一直跑到 end_turn 為止。實務上，模型在回答前通常會做 1–3 輪 search/extract 循環。

Sarah Choy2026年5月3日發佈約 9 分鐘閱讀

多數「AI 研究 Agent」教學寫到「這是工具定義」就停了。本文交付一個能跑的 Agent：丟進問題，吐出附引用的答案。搜尋、擷取、推理、引用 —— 全部不到 120 行 Python。

一句話總結

•最小可用的研究 Agent 迴圈就是：搜尋 → 挑連結 → 擷取正文 → 請模型附上行內引用回答。
•兩個工具 —— Web Search（15 credits）與 URL Extract（每個 URL 2 credits）—— 就涵蓋 95% 的「有依據回答」使用情境。
•Claude tool use 負責整體調度；你只要把 tool_use 區塊 → API 呼叫 → tool_result 區塊一路傳遞，直到模型停下。
•端到端成本：在典型深度下，每個問題約 25 credits + 約 $0.02 的 LLM token。

我們要做什麼

一個研究 Agent，接收一個輸入 —— 自然語言問題 —— 並回傳一個附來源的答案。架構上：

question → [Claude]
              ↓ tool_use(search)
            [API Pick Web Search] → ranked URLs
              ↓ tool_use(extract)
            [API Pick URL Extract] → cleaned bodies
              ↓ Claude reads, decides if more is needed
              ↓ end_turn
          answer with inline [source: URL] citations

大約 120 行 Python。兩個工具 —— search 與 extract —— 加上一個能應付 Claude 任何決策的 Agent 迴圈。我們會分 4 步從零開始打造。

1抓取工具 schema（不用手寫 JSON）

兩個端點都提供一個 tool-schema 路由，回傳的 Claude 工具定義，形態剛好就是 messages.create 所預期的。

import requests

API_KEY = "pk_yourkey"

def fetch_tool(tool_path: str) -> dict:
    """Fetch a Claude tool definition from API Pick's tool-schema endpoint."""
    schema = requests.get(f"https://www.apipick.com{tool_path}/tool-schema").json()
    return schema["claude"]

WEB_SEARCH_TOOL = fetch_tool("/api/search/web")
URL_EXTRACT_TOOL = fetch_tool("/api/extract")

# Cache these once at module load. They don't change between requests.

這替你免費搞定兩件重要的事：參數 schema（讓 Claude 知道你接受 query、country_code、start_date 等等），以及一段清楚、對模型友善的工具用途說明。

2撰寫工具 handler

當 Claude 回傳一個 tool_use 區塊時，你的工作就是呼叫真正的 API，並回傳一個 tool_result 區塊。每個工具一個函式，一個分派器依 Claude 挑的名稱來路由。

def call_tool(tool_use_block) -> dict:
    """Execute the tool Claude asked for and return a tool_result block."""
    name = tool_use_block.name
    args = tool_use_block.input

    if name == "web_search":
        endpoint = "https://www.apipick.com/api/search/web"
    elif name == "extract_urls":
        endpoint = "https://www.apipick.com/api/extract"
    else:
        return {
            "type": "tool_result",
            "tool_use_id": tool_use_block.id,
            "content": f"Unknown tool: {name}",
            "is_error": True,
        }

    resp = requests.post(
        endpoint,
        headers={"x-api-key": API_KEY},
        json=args,
        timeout=30,
    )

    if resp.status_code != 200:
        return {
            "type": "tool_result",
            "tool_use_id": tool_use_block.id,
            "content": f"HTTP {resp.status_code}: {resp.text[:500]}",
            "is_error": True,
        }

    return {
        "type": "tool_result",
        "tool_use_id": tool_use_block.id,
        "content": resp.text,
    }

有三點值得注意。tool_use_id 是 Claude 把結果跟它先前那次呼叫對應起來的依據 —— 你必須原樣回傳。is_error: True 告訴 Claude 要優雅地復原（通常是改用不同的查詢）。而我們傳回原始回應文字 —— Claude 對 API Pick 回傳的 JSON 形態很自在，所以不需要任何轉換。

3Agent 迴圈

迴圈很短：送出對話、檢視回應中的每個區塊、執行所有工具、把 Claude 的 tool_use 與你的 tool_result 都加回對話，重複直到 stop_reason 為 end_turn。

import anthropic

client = anthropic.Anthropic()

SYSTEM_PROMPT = """You are a research assistant. When the user asks a question:

1. Use web_search to find relevant sources. Prefer recent (set start_date) for time-sensitive queries.
2. Use extract_urls to read the body of the top 3-5 most relevant URLs from the search results.
3. Answer the question concisely. After every factual claim, include an inline citation in the form [source: URL].
4. If you don't have enough information, say so — don't fabricate.

Be concise. Aim for 3-5 sentence answers unless the user asks for depth."""

def research(question: str) -> str:
    messages = [{"role": "user", "content": question}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=2048,
            system=SYSTEM_PROMPT,
            tools=[WEB_SEARCH_TOOL, URL_EXTRACT_TOOL],
            messages=messages,
        )

        # Append the assistant's response (may contain tool_use blocks)
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            # Pull out the text answer
            return "\n".join(b.text for b in response.content if b.type == "text")

        if response.stop_reason == "tool_use":
            # Run every tool the model asked for in this turn
            tool_results = [
                call_tool(b)
                for b in response.content if b.type == "tool_use"
            ]
            messages.append({"role": "user", "content": tool_results})
            continue

        raise RuntimeError(f"Unexpected stop_reason: {response.stop_reason}")

注意模型可以在同一輪發出多次工具呼叫 —— 例如，它可能發一個帶五個 URL 的 extract_urls 呼叫來批次處理。我們的處理方式是：收集這個 assistant 回合裡的所有 tool_use 區塊，並在下一個 user 回合回傳所有 tool_result 區塊。

4跑起來

if __name__ == "__main__":
    answer = research("What were the major announcements at OpenAI DevDay this year?")
    print(answer)

你會看到類似這樣的輸出：

OpenAI's DevDay focused on three themes: cheaper inference (a new gpt-mini
tier at roughly half the prior cost) [source: https://openai.com/devday-2026],
agent infrastructure (a managed agent runtime with persistent memory and
tool sandboxing) [source: https://techcrunch.com/...], and developer
tooling (a new Responses API replacing the legacy Assistants API)
[source: https://platform.openai.com/docs/...].

這就是一個研究 Agent。約 120 行、兩個工具、附來源的輸出。

這要花多少錢

一次典型的研究呼叫：

1 次 search 呼叫 —— 15 credits（$0.015）
1 次涵蓋 3–5 個 URL 的 extract 呼叫 —— 6–10 credits（$0.006–$0.010）
約 3,000 input + 800 output 的 Claude token —— 約 $0.02

取整數：每個研究答案約 3 美分。一天 1,000 次研究呼叫約 $30/天 —— 差不多就是一位分析師一個早上的咖啡預算。要編預算就編在 LLM 上，不是搜尋 API。

三項正式環境的優化

1. 依問題快取

如果你的產品會問相似的問題，把 (question, today's date) 做雜湊，並把最終答案快取一小時。多數使用者驅動的研究，長尾命中率高於 30%。

2. 在乎時效性時就加限制

對「本週」/「今天」類的問題，在 search 呼叫裡把 start_date 設成今天或昨天。沒有它的話，搜尋可能回傳去年的文章。最簡單的版本：一條 system-prompt 規則，例如 「當問題含有 today、this week、recent 或 latest 時，務必加上 start_date」。

3. 防幻覺的護欄

單一槓桿效益最高的規則：「若你沒有足夠的擷取內容來回答，請回傳：I couldn't find a reliable source for this。不要猜測。」在我們的測試中，把這條加進 system prompt 能讓捏造的答案減少一個數量級。

接下來可以往哪走

三個自然的延伸方向：

做成垂直領域版：把 Web Search 換成 Academic Search 做文獻回顧 Agent、換成 SEC Filings Search 做盡職調查，或換成 Clinical Search 做醫療研究。handler 不用改 —— 只要改 system prompt。
加上結構化輸出：請模型回傳帶有 answer 與 citations[] 欄位的 JSON，讓你能把它渲染成 UI 卡片。
串流 token：改用 client.messages.stream() 做漸進式輸出 —— 答案很長時很有用。工具呼叫迴圈不變。

常見問題

為什麼用兩個工具，而不是單一的「answer」端點？

把 search 與 extract 拆開，能讓你掌控要讀幾個 URL、何時停止、引用哪些。單一托管的「answer」端點把這些都藏起來了 —— 不重新設計架構就很難換策略。有了兩個工具，只要調整 prompt，同一份程式碼就能變成晨間簡報 Agent、文獻回顧器，或競品情報爬蟲。

這套也能搭配 OpenAI Assistants / Responses API 嗎？

可以。架構完全一樣 —— 唯一不同的是你怎麼解析 tool-call 區塊、怎麼提交結果。handler 的形態會變成 submit_tool_outputs(...)，而不是 tool_result 內容區塊，但 Agent 迴圈與工具定義是同一份 JSON。

Agent 怎麼決定何時停止？

當 Claude 有足夠的上下文可以回答時，會回傳 stop_reason: end_turn；想呼叫工具時則回 tool_use。迴圈就一直跑到 end_turn 為止。實務上，模型在回答前通常會做 1–3 輪 search/extract 循環。

我要怎麼確保它真的會引用來源？

有兩個槓桿。第一：在 system prompt 裡明確要求每個論述後面行內附上「[source: URL]」。第二：URL Extract 的回應本身就含 URL，所以模型在上下文裡有它時，傾向自然地引用。如果引用偶爾漏掉，可加一道最終格式檢查，拒絕該答案並要求模型補上引用。

延遲呢？

每次 search 來回約 1 秒；每批 extract 視 URL 數量與 JS 渲染需 1–4 秒。一次典型的「研究一個問題」呼叫，牆鐘時間落在 5–15 秒。如果要更快，可把 extract 步驟平行化（extract 本來就支援一次呼叫傳入一串 URL），並用 system prompt 指示模型每個問題只跑一輪工具。