[ blog · tutorial ]9 min read

Search + URL Extract के साथ एक Research Agent बनाएं (Claude tool use, शुरू से अंत तक)

Sarah Choyप्रकाशित 3 मई 20269 मिनट पढ़ें

ज़्यादातर 'AI research agent' tutorials 'यह रही एक tool definition' पर ही रुक जाते हैं। यह एक काम करने वाला agent देता है: सवाल अंदर, citations के साथ जवाब बाहर। Search, extract, reason, cite — सब कुछ 120 lines के Python में।

TL;DR

•सबसे छोटा काम का research agent loop यह है: search → links चुनें → bodies extract करें → model से inline citations के साथ जवाब देने को कहें।
•दो tools — Web Search (15 credits) और URL Extract (प्रति URL 2 credits) — आधारित-जवाब वाले 95% use cases को कवर करते हैं।
•Claude tool use orchestration संभालता है; आप बस tool_use blocks → API calls → tool_result blocks को आगे-पीछे करते रहते हैं जब तक model रुक न जाए।
•शुरू से अंत तक की लागत: सामान्य गहराई पर प्रति सवाल ~25 credits + LLM tokens में ~$0.02।

हम क्या बना रहे हैं

एक research agent जो एक input लेता है — एक natural-language सवाल — और एक sourced जवाब लौटाता है। architectural रूप से:

question → [Claude]
              ↓ tool_use(search)
            [API Pick Web Search] → ranked URLs
              ↓ tool_use(extract)
            [API Pick URL Extract] → cleaned bodies
              ↓ Claude reads, decides if more is needed
              ↓ end_turn
          answer with inline [source: URL] citations

लगभग 120 lines का Python। दो tools — search और extract — और एक agent loop जो वह सब संभालता है जो भी Claude करने का फैसला करे। हम इसे 4 चरणों में शुरू से बनाएंगे।

1tool schemas खींचें (हाथ से कोई JSON नहीं)

दोनों endpoints एक tool-schema route publish करते हैं जो ठीक उसी रूप में एक Claude tool definition लौटाता है जिसकी messages.create अपेक्षा करता है।

import requests

API_KEY = "pk_yourkey"

def fetch_tool(tool_path: str) -> dict:
    """Fetch a Claude tool definition from API Pick's tool-schema endpoint."""
    schema = requests.get(f"https://www.apipick.com{tool_path}/tool-schema").json()
    return schema["claude"]

WEB_SEARCH_TOOL = fetch_tool("/api/search/web")
URL_EXTRACT_TOOL = fetch_tool("/api/extract")

# Cache these once at module load. They don't change between requests.

दो महत्वपूर्ण चीज़ें जो यह आपको मुफ़्त में देता है: parameter schema (ताकि Claude जाने कि आप query, country_code, start_date आदि स्वीकार करते हैं) और tool क्या करता है इसका एक स्पष्ट, model-friendly वर्णन।

2tool handler लिखें

जब Claude एक tool_use block लौटाता है, तो आपका काम है असली API को call करना और एक tool_result block लौटाना। प्रति tool एक function, और एक dispatcher जो Claude द्वारा चुने गए नाम के आधार पर route करता है।

def call_tool(tool_use_block) -> dict:
    """Execute the tool Claude asked for and return a tool_result block."""
    name = tool_use_block.name
    args = tool_use_block.input

    if name == "web_search":
        endpoint = "https://www.apipick.com/api/search/web"
    elif name == "extract_urls":
        endpoint = "https://www.apipick.com/api/extract"
    else:
        return {
            "type": "tool_result",
            "tool_use_id": tool_use_block.id,
            "content": f"Unknown tool: {name}",
            "is_error": True,
        }

    resp = requests.post(
        endpoint,
        headers={"x-api-key": API_KEY},
        json=args,
        timeout=30,
    )

    if resp.status_code != 200:
        return {
            "type": "tool_result",
            "tool_use_id": tool_use_block.id,
            "content": f"HTTP {resp.status_code}: {resp.text[:500]}",
            "is_error": True,
        }

    return {
        "type": "tool_result",
        "tool_use_id": tool_use_block.id,
        "content": resp.text,
    }

ध्यान देने योग्य तीन बातें। tool_use_id वह है जिससे Claude result को अपनी पहले की call से जोड़ता है — आपको इसे वापस भेजना ही होगा। is_error: True Claude को बताता है कि वह सहजता से उबर जाए (अक्सर कोई दूसरी query आज़माकर)। और हम कच्चा response text पास करते हैं — Claude API Pick द्वारा लौटाए गए JSON रूप के साथ सहज है, इसलिए किसी transformation की ज़रूरत नहीं।

3agent loop

loop छोटा है: conversation भेजें, response के हर block को देखें, कोई भी tools चलाएं, conversation में Claude का tool_use और आपका tool_result दोनों जोड़ें, और तब तक दोहराएं जब तक stop_reason, end_turn न हो जाए।

import anthropic

client = anthropic.Anthropic()

SYSTEM_PROMPT = """You are a research assistant. When the user asks a question:

1. Use web_search to find relevant sources. Prefer recent (set start_date) for time-sensitive queries.
2. Use extract_urls to read the body of the top 3-5 most relevant URLs from the search results.
3. Answer the question concisely. After every factual claim, include an inline citation in the form [source: URL].
4. If you don't have enough information, say so — don't fabricate.

Be concise. Aim for 3-5 sentence answers unless the user asks for depth."""

def research(question: str) -> str:
    messages = [{"role": "user", "content": question}]

    while True:
        response = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=2048,
            system=SYSTEM_PROMPT,
            tools=[WEB_SEARCH_TOOL, URL_EXTRACT_TOOL],
            messages=messages,
        )

        # Append the assistant's response (may contain tool_use blocks)
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            # Pull out the text answer
            return "\n".join(b.text for b in response.content if b.type == "text")

        if response.stop_reason == "tool_use":
            # Run every tool the model asked for in this turn
            tool_results = [
                call_tool(b)
                for b in response.content if b.type == "tool_use"
            ]
            messages.append({"role": "user", "content": tool_results})
            continue

        raise RuntimeError(f"Unexpected stop_reason: {response.stop_reason}")

ध्यान दें कि model एक ही turn में कई tool calls छोड़ सकता है — उदाहरण के लिए, यह पांच URLs के साथ एक अकेली extract_urls call जारी कर सकता है ताकि उन्हें batch में संसाधित किया जा सके। हम इसे assistant turn के सभी tool_use blocks इकट्ठा करके और अगले user turn में सभी tool_result blocks लौटाकर संभालते हैं।

4इसे चलाएं

if __name__ == "__main__":
    answer = research("What were the major announcements at OpenAI DevDay this year?")
    print(answer)

आपको कुछ ऐसा दिखेगा:

OpenAI's DevDay focused on three themes: cheaper inference (a new gpt-mini
tier at roughly half the prior cost) [source: https://openai.com/devday-2026],
agent infrastructure (a managed agent runtime with persistent memory and
tool sandboxing) [source: https://techcrunch.com/...], and developer
tooling (a new Responses API replacing the legacy Assistants API)
[source: https://platform.openai.com/docs/...].

यही एक research agent है। ~120 lines, दो tools, sourced output।

इसकी लागत कितनी है

एक सामान्य research call:

1 search call — 15 credits ($0.015)
3–5 URLs को कवर करने वाली 1 extract call — 6–10 credits ($0.006–$0.010)
~3,000 input + 800 output Claude tokens — ~$0.02

मोटे तौर पर: प्रति शोधित जवाब ~3 cents। प्रति दिन 1,000 research calls पर यह ~$30/दिन होता है — लगभग उतना ही जितना एक analyst की सुबह की coffee का बजट होता है। बजट LLM का बनाएं, search APIs का नहीं।

उत्पादन के लिए तीन परिशोधन

1. सवाल के अनुसार cache करें

अगर आपका product मिलते-जुलते सवाल पूछता है, तो (question, today's date) का hash निकालें और अंतिम जवाब को एक घंटे के लिए cache करें। ज़्यादातर user-संचालित research की long-tail hit rates 30% से ऊपर होती हैं।

2. जब ताज़गी मायने रखे तो सीमित करें

'इस हफ़्ते' / 'आज' वाले सवालों के लिए, search call में start_date को वर्तमान तिथि या कल पर सेट करें। इसके बिना search पिछले साल के लेख लौटा सकता है। सबसे सरल संस्करण: एक system-prompt नियम जैसे 'जब सवाल में आज, इस हफ़्ते, हाल का, या नवीनतम हो तो हमेशा start_date शामिल करें'.

3. hallucination पर guardrails

सबसे ज़्यादा प्रभाव वाला अकेला नियम: 'अगर जवाब देने के लिए आपके पास पर्याप्त extracted content नहीं है, तो लौटाएं: मुझे इसके लिए कोई भरोसेमंद source नहीं मिला। अनुमान मत लगाएं.' system prompt में इसे जोड़ने से हमारे परीक्षण में गढ़े गए जवाब एक order of magnitude कम हो जाते हैं।

यह क्या नहीं है

यह कोई autonomous agent नहीं है जो घंटों तक बिना निगरानी के चलता रहे। यह एक-सवाल-अंदर, एक-जवाब-बाहर वाला loop है। उत्पादन में भेजे जाने वाले ज़्यादातर 'research agent' यही होते हैं — इससे ज़्यादा विस्तृत कोई भी चीज़ (planning, multi-step workflows, persistent memory) एक अलग समस्या है जिसकी surface area कहीं बड़ी होती है। इसी से शुरू करें और जटिलता तभी जोड़ें जब कोई वास्तविक use case इसकी मांग करे।

इसे आगे कहां ले जाएं

तीन स्वाभाविक विस्तार:

इसे verticalize करें: एक literature-review agent के लिए Web Search को Academic Search से बदलें, due-diligence के लिए SEC Filings Search, या medical research के लिए Clinical Search। handler नहीं बदलता — सिर्फ़ system prompt बदलता है।
structured output जोड़ें: model से answer और citations[] fields के साथ JSON लौटाने को कहें ताकि आप उन्हें एक UI card के रूप में render कर सकें।
tokens stream करें: incremental output के लिए client.messages.stream() पर स्विच करें — जब जवाब लंबा हो तो उपयोगी। tool-call loop वही रहता है।

अक्सर पूछे जाने वाले प्रश्न

एक ही 'answer' endpoint के बजाय दो tools क्यों?

search और extract को अलग करने से आपको नियंत्रण मिलता है कि कितने URLs पढ़ने हैं, कब रुकना है, और किनका citation देना है। एक अकेला hosted 'answer' endpoint इसे छिपा देता है — आप फिर से architecture बदले बिना आसानी से strategy नहीं बदल सकते। दो tools के साथ, वही code सिर्फ़ prompt में बदलाव करके एक morning briefing agent, एक literature reviewer, या एक competitive intel scraper बन जाता है।

क्या यह OpenAI Assistants / Responses API के साथ भी काम करता है?

हां। architecture एक जैसा ही है — सिर्फ़ यह बदलता है कि आप tool-call blocks को कैसे parse करते हैं और results कैसे submit करते हैं। handler का रूप tool_result content block के बजाय submit_tool_outputs(...) बन जाता है, लेकिन agent loop और tool definitions वही JSON रहते हैं।

agent कैसे तय करता है कि कब रुकना है?

जब Claude के पास जवाब देने के लिए पर्याप्त context होता है तो वह stop_reason: end_turn लौटाता है, और जब वह कोई tool call करना चाहता है तो tool_use लौटाता है। loop बस end_turn तक चलता रहता है। व्यवहार में model आमतौर पर जवाब देने से पहले 1–3 search/extract cycles करता है।

मैं कैसे सुनिश्चित करूं कि यह वाकई sources cite करे?

दो तरीके। पहला: system prompt में, हर दावे के बाद inline '[source: URL]' की स्पष्ट रूप से मांग करें। दूसरा: URL Extract की response में URL शामिल होती है, इसलिए जब model के पास वह context में होती है तो वह स्वाभाविक रूप से cite करने लगता है। अगर citations छूट जाएं, तो एक अंतिम formatting pass जोड़ें जो जवाब को reject करे और model से citations जोड़ने को कहे।

latency का क्या?

हर search round-trip ~1s का होता है; हर extract batch URL संख्या और JS rendering के आधार पर 1–4s का होता है। एक सामान्य 'किसी सवाल पर research करो' call 5–15s wall-clock में पूरी होती है। अगर आपको इसे तेज़ चाहिए, तो extract step को parallel करें (extract पहले से ही एक call में URLs की list लेता है) और system prompt निर्देशों से model को प्रति सवाल एक tool round तक सीमित करें।