Build a Research Agent with Search + URL Extract (Claude tool use, end-to-end)

Most 'AI research agent' tutorials stop at 'here's a tool definition.' This one ships a working agent: question in, cited answer out. Search, extract, reason, cite — all under 120 lines of Python.
TL;DR
- •The smallest useful research agent loop is: search → pick links → extract bodies → ask the model to answer with inline citations.
- •Two tools — Web Search (15 credits) and URL Extract (2 credits per URL) — cover 95% of grounded-answer use cases.
- •Claude tool use handles the orchestration; you just shuttle tool_use blocks → API calls → tool_result blocks until the model stops.
- •End-to-end cost: ~25 credits + ~$0.02 in LLM tokens per question at typical depth.
What we're building
A research agent that takes one input — a natural-language question — and returns a sourced answer. Architecturally:
question → [Claude]
↓ tool_use(search)
[API Pick Web Search] → ranked URLs
↓ tool_use(extract)
[API Pick URL Extract] → cleaned bodies
↓ Claude reads, decides if more is needed
↓ end_turn
answer with inline [source: URL] citationsAbout 120 lines of Python. Two tools — search and extract — and one agent loop that handles whatever Claude decides to do. We'll build it from scratch in 4 steps.
1Pull the tool schemas (no JSON by hand)
Both endpoints publish a tool-schema route that returns a Claude tool definition in the exact shape messages.create expects.
import requests
API_KEY = "pk_yourkey"
def fetch_tool(tool_path: str) -> dict:
"""Fetch a Claude tool definition from API Pick's tool-schema endpoint."""
schema = requests.get(f"https://www.apipick.com{tool_path}/tool-schema").json()
return schema["claude"]
WEB_SEARCH_TOOL = fetch_tool("/api/search/web")
URL_EXTRACT_TOOL = fetch_tool("/api/extract")
# Cache these once at module load. They don't change between requests.Two important things this gives you for free: the parameter schema (so Claude knows you accept query, country_code, start_date etc.) and a clear, model-friendly description of what the tool does.
2Write the tool handler
When Claude returns a tool_use block, your job is to call the actual API and return a tool_result block. One function per tool, one dispatcher to route based on the name Claude picked.
def call_tool(tool_use_block) -> dict:
"""Execute the tool Claude asked for and return a tool_result block."""
name = tool_use_block.name
args = tool_use_block.input
if name == "web_search":
endpoint = "https://www.apipick.com/api/search/web"
elif name == "extract_urls":
endpoint = "https://www.apipick.com/api/extract"
else:
return {
"type": "tool_result",
"tool_use_id": tool_use_block.id,
"content": f"Unknown tool: {name}",
"is_error": True,
}
resp = requests.post(
endpoint,
headers={"x-api-key": API_KEY},
json=args,
timeout=30,
)
if resp.status_code != 200:
return {
"type": "tool_result",
"tool_use_id": tool_use_block.id,
"content": f"HTTP {resp.status_code}: {resp.text[:500]}",
"is_error": True,
}
return {
"type": "tool_result",
"tool_use_id": tool_use_block.id,
"content": resp.text,
}Three things worth noting. tool_use_id is how Claude correlates the result with its earlier call — you must echo it back. is_error: True tells Claude to recover gracefully (often by trying a different query). And we pass the raw response text — Claude is comfortable with the JSON shape API Pick returns, so no transformation needed.
3The agent loop
The loop is short: send the conversation, look at every block in the response, run any tools, append both Claude's tool_use and your tool_result to the conversation, repeat until stop_reason is end_turn.
import anthropic
client = anthropic.Anthropic()
SYSTEM_PROMPT = """You are a research assistant. When the user asks a question:
1. Use web_search to find relevant sources. Prefer recent (set start_date) for time-sensitive queries.
2. Use extract_urls to read the body of the top 3-5 most relevant URLs from the search results.
3. Answer the question concisely. After every factual claim, include an inline citation in the form [source: URL].
4. If you don't have enough information, say so — don't fabricate.
Be concise. Aim for 3-5 sentence answers unless the user asks for depth."""
def research(question: str) -> str:
messages = [{"role": "user", "content": question}]
while True:
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=2048,
system=SYSTEM_PROMPT,
tools=[WEB_SEARCH_TOOL, URL_EXTRACT_TOOL],
messages=messages,
)
# Append the assistant's response (may contain tool_use blocks)
messages.append({"role": "assistant", "content": response.content})
if response.stop_reason == "end_turn":
# Pull out the text answer
return "\n".join(b.text for b in response.content if b.type == "text")
if response.stop_reason == "tool_use":
# Run every tool the model asked for in this turn
tool_results = [
call_tool(b)
for b in response.content if b.type == "tool_use"
]
messages.append({"role": "user", "content": tool_results})
continue
raise RuntimeError(f"Unexpected stop_reason: {response.stop_reason}")Notice the model can fire multiple tool calls in one turn — for example, it might issue a single extract_urls call with five URLs to batch them. We handle that by collecting all tool_use blocks from the assistant turn and returning all tool_result blocks in the next user turn.
4Run it
if __name__ == "__main__":
answer = research("What were the major announcements at OpenAI DevDay this year?")
print(answer)You'll see something like:
OpenAI's DevDay focused on three themes: cheaper inference (a new gpt-mini
tier at roughly half the prior cost) [source: https://openai.com/devday-2026],
agent infrastructure (a managed agent runtime with persistent memory and
tool sandboxing) [source: https://techcrunch.com/...], and developer
tooling (a new Responses API replacing the legacy Assistants API)
[source: https://platform.openai.com/docs/...].That's a research agent. ~120 lines, two tools, sourced output.
What this costs
A typical research call:
- 1 search call — 15 credits ($0.015)
- 1 extract call covering 3–5 URLs — 6–10 credits ($0.006–$0.010)
- ~3,000 input + 800 output Claude tokens — ~$0.02
Round figure: ~3 cents per researched answer. At 1,000 research calls per day that's ~$30/day — about what one analyst's morning coffee budget covers. Budget the LLM, not the search APIs.
Three production refinements
1. Cache by question
If your product asks similar questions, hash (question, today's date) and cache the final answer for an hour. Most user-driven research has long-tail hit rates above 30%.
2. Constrain when freshness matters
For 'this week' / 'today' questions, set start_date in the search call to the current date or yesterday. Without it the search may return last-year's articles. The simplest version: a system-prompt rule like 'always include start_date when the question contains today, this week, recent, or latest'.
3. Guardrails on hallucination
The single highest-leverage rule: 'If you don't have enough extracted content to answer, return: I couldn't find a reliable source for this. Do not guess.' Adding this to the system prompt drops fabricated answers by an order of magnitude in our testing.
Where to take it next
Three natural extensions:
- Verticalise it: swap Web Search for Academic Search for a literature-review agent, SEC Filings Search for due-diligence, or Clinical Search for medical research. The handler doesn't change — only the system prompt does.
- Add structured output: ask the model to return JSON with
answerandcitations[]fields so you can render them as a UI card. - Stream tokens: switch to
client.messages.stream()for incremental output — useful when the answer is long. The tool-call loop is the same.
Frequently Asked Questions
Why two tools instead of a single 'answer' endpoint?
Splitting search and extract gives you control over how many URLs to read, when to stop, and which ones to cite. A single hosted 'answer' endpoint hides this — you can't easily change the strategy without re-architecting. With two tools, the same code becomes a morning briefing agent, a literature reviewer, or a competitive intel scraper just by tweaking the prompt.
Does this work with OpenAI Assistants / Responses API too?
Yes. The architecture is identical — the only thing that changes is how you parse the tool-call blocks and submit results. The handler shape becomes submit_tool_outputs(...) instead of an tool_result content block, but the agent loop and tool definitions are the same JSON.
How does the agent decide when to stop?
Claude returns stop_reason: end_turn when it has enough context to answer, and tool_use when it wants to call a tool. The loop just keeps going until end_turn. In practice the model usually does 1–3 search/extract cycles before answering.
How do I make sure it actually cites sources?
Two levers. First: in the system prompt, explicitly require '[source: URL]' inline after every claim. Second: the URL Extract response includes the URL, so when the model has it in context it tends to cite naturally. If citations slip, add a final formatting pass that rejects the answer and asks the model to add citations.
What about latency?
Each search round-trip is ~1s; each extract batch is 1–4s depending on URL count and JS rendering. A typical 'research a question' call lands in 5–15s wall-clock. If you need it faster, parallelise the extract step (extract takes a list of URLs in one call already) and constrain the model to one tool round per question with system prompt instructions.
APIs used in this article
Sarah Choy is the CEO of API Pick. She writes about building production-ready APIs for AI agents and LLM workflows.