Building a Due-Diligence Agent on SEC Filings (10-K, 10-Q, 8-K, Earnings)

Sarah ChoyPublished May 3, 20269 min read
Building a Due-Diligence Agent on SEC Filings (10-K, 10-Q, 8-K, Earnings)

Reading a 10-K is mostly Ctrl+F. Doing it for fifty companies is a job. Replace the boring 80% with a search-and-extract agent against SEC EDGAR — keep the 20% that matters for the human analyst.

TL;DR

  • Architecture: ticker lookup → SEC Filings Search (filings + earnings + equity stats) → URL Extract for the long passages → LLM answer with section-level citations.
  • Cost ceiling: SEC Filings Search is 120 credits per call (≈$0.12); a typical 3-question company review costs ~$0.40 in credits + ~$0.05 in LLM tokens.
  • What the agent gets right: factual lookups (segment revenue, capex trend, governance, risk-factor changes year-over-year), executive compensation summaries, recent 8-K events.
  • What still needs a human: judgement calls on management quality, market positioning, deal-specific issues, anything outside the filing language.

Why this is worth automating

A first-pass due diligence read on a public company is mostly mechanical: pull the latest 10-K, scan the risk factors and MD&A, check the recent 8-Ks, glance at the most recent earnings call. An associate-level analyst spends 2–4 hours doing this per company. The output is rarely a deep insight — it's a structured fact pattern that another, more senior person then reasons about.

That fact-pattern step is exactly what a small agent can take over. Search SEC, extract the relevant passages, summarise with citations. The senior person still does the reasoning — but they start from a 5-minute read instead of a 4-hour one.

Three things make this practical now:

  • Semantic search over filings means you can ask 'how did segment revenue change' and get back the right paragraph from the right form, instead of reading 200 pages.
  • Long-context LLMs can hold a full 10-K plus a few 8-Ks in working memory and answer cross-document questions.
  • Citation discipline in the prompt makes the output verifiable in seconds — exactly what compliance and review workflows require.

Architecture

question + ticker
       ├─ /api/company/facts (2 credits)
       │  ↳ confirm ticker, get CIK and sector
       ├─ /api/search/sec (120 credits)
       │  ↳ semantic search across 10-K/10-Q/8-K/earnings/equity stats
       ├─ /api/extract (2 credits per URL)
       │  ↳ pull full text of the top 3-5 most relevant filings
       └─ Claude / GPT-4 with citation-required prompt
          ↳ "answer + [Form, Fiscal Period, Section]"

Single-question cost: ~130 credits (~$0.13) of API + ~$0.03 of LLM. A three-question company review (financial trend, risk-factor diff, recent material events) lands at ~$0.45–$0.60. Compared to an analyst hour at any reasonable bill rate, the math is obvious.

The system prompt that earns its keep

The single biggest determinant of output quality on financial RAG is the system prompt. The one we use:

You are a financial research assistant for an investment team. You answer
questions about US public companies using SEC filings, earnings call
transcripts, and equity statistics retrieved by your tools.

Rules — non-negotiable:

1. Cite every numeric claim with: [Form, Fiscal Period, Section]. Example:
   "Operating income rose 12% YoY to $4.1B [10-K FY2025, Item 7 MD&A]."

2. Quote numbers verbatim. Do not round, paraphrase, or convert. If a
   filing says "$4,127M", do not say "$4.1B" unless the filing itself does.

3. If the answer requires content you have not extracted, say so:
   "I could not retrieve the FY2024 10-K for the segment-level breakdown.
   Please re-run with that filing in scope."

4. Do not infer from training-data knowledge. If your tools didn't return
   it, you don't know it.

5. Default to the most recent fiscal period available. State the period
   you used.

6. Format multi-figure answers as a small markdown table with one column
   per fiscal period. Always end with a one-line "How I read this" summary.

Tone: precise, terse, no marketing language.

Rules 1, 2, and 4 between them eliminate ~90% of fabrication issues we've measured. Rule 3 (graceful "I don't know") is what separates this from a chatbot that confidently makes up numbers.

Sample queries the agent handles cleanly

  • 'Compare Apple's services revenue trend over the last 5 fiscal years.' → pulls from the relevant 10-Ks, returns a table with citations to MD&A.
  • 'What changed in NVIDIA's risk factors between FY2023 and FY2025?' → cross-document diff, citing Item 1A in each form.
  • 'Summarise the last 4 8-Ks for $TICKER.' → semantic search filtered to 8-K, ordered by filing date.
  • 'What did Microsoft's CFO say about AI capex on the most recent earnings call?' → searches transcripts, extracts the relevant Q&A passage, quotes verbatim.

Where it falls short — and the right human handoff

The agent stumbles in three predictable places:

  • Judgement calls on management quality. The filing tells you what they did, not whether they're capable. Don't ask the agent.
  • Industry comparables outside the filing. If the question is 'how does this gross margin compare to peers', the agent only knows what's in the searched filings. For peer comparison you need either a separate dataset or to run the agent once per company and aggregate.
  • Forward-looking commentary. Filings contain forward-looking statements but the model treats them at face value unless told otherwise. Add to the prompt: 'Flag forward-looking statements explicitly. Do not present guidance as fact.'

The minimum viable build

The same Claude tool-use loop pattern from the research-agent walkthrough applies — only the tools and the system prompt change:

from anthropic import Anthropic
import requests

KEY = "pk_yourkey"
client = Anthropic()

def fetch_tool(path: str) -> dict:
    return requests.get(f"https://www.apipick.com{path}/tool-schema").json()["claude"]

TOOLS = [
    fetch_tool("/api/search/sec"),
    fetch_tool("/api/extract"),
    fetch_tool("/api/company/facts"),
]

def call_tool(block):
    name_to_path = {
        "sec_search": "/api/search/sec",
        "extract_urls": "/api/extract",
        "company_facts": "/api/company/facts",
    }
    path = name_to_path[block.name]
    method = "GET" if block.name == "company_facts" else "POST"
    if method == "GET":
        resp = requests.get(
            f"https://www.apipick.com{path}",
            params=block.input,
            headers={"x-api-key": KEY},
            timeout=30,
        )
    else:
        resp = requests.post(
            f"https://www.apipick.com{path}",
            json=block.input,
            headers={"x-api-key": KEY},
            timeout=30,
        )
    return {
        "type": "tool_result",
        "tool_use_id": block.id,
        "content": resp.text,
        "is_error": resp.status_code != 200,
    }

def due_dil(question: str) -> str:
    messages = [{"role": "user", "content": question}]
    while True:
        r = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=2048,
            system=SYSTEM_PROMPT,  # the one above
            tools=TOOLS,
            messages=messages,
        )
        messages.append({"role": "assistant", "content": r.content})
        if r.stop_reason == "end_turn":
            return "\n".join(b.text for b in r.content if b.type == "text")
        if r.stop_reason == "tool_use":
            results = [call_tool(b) for b in r.content if b.type == "tool_use"]
            messages.append({"role": "user", "content": results})

print(due_dil("Compare Snowflake's product revenue YoY for the last 4 quarters."))

Where to take it next

  • Templates for repeatable reviews. Wrap the agent in a small CLI that takes a ticker and emits a fixed-format markdown brief: 'Recent 8-Ks', 'Segment revenue trend', 'Risk-factor delta'. Same agent, scripted prompts.
  • Watchlist mode. Run the agent against a ticker every morning and diff today's answer against yesterday's. Surface only the deltas. Pairs well with the morning-briefing pattern.
  • Combine with patents and prediction markets. For tech / biotech names, layer in Patent Search for IP changes and Prediction Markets for crowd-implied outcomes (e.g. drug approval probability).

The pattern generalises. SEC is the densest, most schema-friendly corpus we ship — but the loop ('semantic search → URL extract → answer with citations') applies to any structured-document corpus you care about: legal filings, scientific abstracts, patent claims. Build the diligence agent first, then port the architecture sideways.

Frequently Asked Questions

How fresh is the SEC index?

Filings are indexed within hours of being accepted by EDGAR. For 8-Ks (the time-sensitive ones — material events, leadership changes, acquisitions), this is usually fast enough for end-of-day workflows. If you need under-an-hour notification of new 8-Ks, pair the search with a separate SEC RSS feed and use the agent only for content analysis, not detection.

Does it cover earnings call transcripts?

Yes — the SEC Filings Search index includes US earnings call transcripts and equity statistics (price/volume, market cap history) alongside the filings themselves. A single semantic query can pull from any of these sources.

What are the cost levers if I'm doing this at scale?

Three. (1) Pre-filter via Company Facts (2 credits) to confirm a ticker is a real public company before spending 120 credits on SEC search. (2) Cache search results by (ticker, quarter) — filings only update on schedule. (3) Use one wide search per question rather than many narrow ones; the agent is good at synthesising across results.

Can the agent handle non-US filings?

SEC Filings Search covers US-listed companies (10-K, 10-Q, 8-K). For UK companies, pair UK Legal Search and Web Search. For other jurisdictions, fall back to Web Search + URL Extract over the relevant national regulator's site (Companies House, SEDAR, etc.).

How do I avoid hallucinated numbers?

Three rules in the system prompt move the needle the most. (1) 'Quote numbers verbatim from extracted text — never paraphrase or round.' (2) 'Always include the filing form, fiscal period, and a section reference.' (3) 'If the relevant filing wasn't extracted, say so explicitly — do not infer from training data.' These three together remove most fabrication.

APIs used in this article

Written by
Sarah Choy
CEO, API Pick

Sarah Choy is the CEO of API Pick. She writes about building production-ready APIs for AI agents and LLM workflows.