Citation-Grounded UK Case Law Retrieval After Ayinde v Haringey

Sarah ChoyPublished May 3, 202611 min read

Ayinde v Haringey changed the calculus for UK legal AI overnight. A barrister cited five fake cases generated by an LLM and got referred to the Bar Standards Board. Hallucinated citations now have professional-conduct consequences. Here's the developer's guide to building citation-grounded UK case law retrieval — TNA Find Case Law, legislation.gov.uk, neutral-citation parsing, and how to wire it into a working agent.

TL;DR

•Ayinde v Haringey [2025] EWHC 1383 (Admin) made hallucinated case citations a professional-conduct issue, not just a quality problem.
•BAILII's terms restrict bulk programmatic access; the National Archives' Find Case Law service is the legitimate API for England & Wales judgments.
•legislation.gov.uk has a stable Developer API with point-in-time access — essential for compliance work where 'in force on 30 June 2024' is a real query.
•Stanford's HAI 'Hallucination-Free?' study showed Westlaw and Lexis+ AI still hallucinate 17–33% of the time — citation grounding is the architectural answer, not a vendor choice.
•API Pick UK Legal Search wraps semantic retrieval over case law and primary legislation in one POST — 60 credits per call.

The case that changed the calculus

On 6 June 2025 the Divisional Court of the King's Bench Division handed down two judgments in a single ruling — Ayinde v The London Borough of Haringey and Al-Haroun v Qatar National Bank QPSC, neutral citation [2025] EWHC 1383 (Admin). In both, counsel had filed materials citing authorities that did not exist; in Ayinde, five fabricated cases. The court found a breach of professional duties and referred the practitioners to the BSB and SRA respectively.

Within 48 hours, every UK legal-tech roadmap had a new bullet at the top: citation grounding. Not as a quality nicety, as a regulatory necessity. The Stanford HAI 'Hallucination-Free?' study (Magesh et al., May 2024) had already shown that the leading vendor tools — Lexis+ AI, Westlaw AI — hallucinate citations 17–33% of the time on benchmarked queries. Ayinde turned that academic finding into a compliance problem.

For developers building AI features into UK legal products, the architectural lesson is clear: retrieval has to come from authoritative sources, citations have to be verifiable to the source, and the system has to refuse to answer when it can't ground a response. Anything else is a referral to a regulator waiting to happen.

Here's the working developer's guide to doing this properly: which APIs to use, what their gotchas are, and how to wire them into a working agent.

The authoritative sources

1. National Archives Find Case Law

Run by The National Archives (TNA). Covers court judgments handed down since April 2003 from the High Court, Court of Appeal, UK Supreme Court, Privy Council, and several tribunals. Documents in the Akoma Ntoso (LegalDocML) XML standard. Public API at caselaw.nationalarchives.gov.uk; bulk-data feed via gated application for "computational analysis" purposes.

Strengths: authoritative source for England & Wales case law, structured XML preserves judgment structure (paragraph numbers, citations, headings), Open Government Licence terms permit reuse with attribution. Weaknesses: limited coverage before 2003, application required for bulk feed, LegalDocML is heavy to parse if you've never touched legal XML.

2. legislation.gov.uk Developer API

Run by TNA. Covers UK primary legislation (Acts of Parliament), statutory instruments, and Northern Ireland / Scottish / Welsh equivalents. Critically, supports point-in-time queries: 'what did this provision look like on 1 April 2018'.

Strengths: stable, well-documented, point-in-time access, OGL-licensed. Weaknesses: schema (CLML) is its own thing; not all amendments tracked uniformly back through history; the in-force-date model can surprise you.

3. BAILII (read-only via human web)

Long-standing legal information aggregator. Critical for jurisdictions and document types Find Case Law doesn't cover (older judgments, Scottish / NI material, some tribunal decisions). Bulk scraping is restricted by BAILII's terms. Treat it as a human-readable backup, not an API.

4. API Pick UK Legal Search

Semantic search across UK case law and primary legislation in one POST endpoint. JSON in / JSON out, 60 credits per call, only-on-success billing. Returns title, neutral citation, source URL, and snippet ranked by semantic relevance. Designed for AI-agent tool calling.

Side by side

Sources at the time of writing. Confirm licensing terms with TNA before commercial deployment.

	Find Case Law	legislation.gov.uk	API Pick UK Legal Search
Coverage	E&W case law since 2003	UK statutes + SIs, point-in-time	Case law + legislation, semantic
Format	Akoma Ntoso XML	CLML XML + JSON / Atom	JSON, snippet pre-shaped
Search	Keyword	Keyword + structural	Semantic
Auth	None for public; gated bulk feed	None	x-api-key
Licence	Open Government Licence	Open Government Licence	API Pick TOS
Fit	Compliance-grade source-of-truth	Statute lookup, point-in-time queries	AI-agent retrieval, RAG layer

The architecture that survives Ayinde

The minimum viable citation-grounded stack:

Question → [LLM agent]
              ↓ tool_use(uk_legal_search)
            [API Pick UK Legal Search] → ranked authorities
              ↓ tool_use(extract_urls)
            [API Pick URL Extract] → full judgment / statute body
              ↓ Agent reads, drafts answer
              ↓ Citation-required prompt rule
          Answer with [Neutral Citation, Section/Paragraph]
              ↓ Final verification pass
          Refuse if citation can't be matched in extracted text

The verification pass is where most teams fail. It's not enough to ask the model 'cite your sources' — you have to programmatically verify that every cited authority appears in the extracted text. If it doesn't, refuse the answer or surface it for human review.

Working code

import re, requests
from anthropic import Anthropic

KEY = "pk_yourkey"
client = Anthropic()

def fetch_tool(path: str) -> dict:
    return requests.get(f"https://www.apipick.com{path}/tool-schema").json()["claude"]

TOOLS = [
    fetch_tool("/api/search/uk-legal"),
    fetch_tool("/api/extract"),
]

SYSTEM = """You are a UK legal research assistant. You answer questions about
England & Wales case law and UK primary legislation using the tools available.

Rules — non-negotiable:

1. Use uk_legal_search to find authorities relevant to the question.
2. For authorities you intend to cite, use extract_urls to retrieve the
   full text. Do not cite anything you have not extracted.
3. Cite every legal proposition with a neutral citation in standard form:
   [YYYY] EWHC|EWCA|UKSC NNN (Court), §[paragraph] — for cases.
   Section N(M) of the [Act Name] YYYY — for statutes, with point-in-time
   noted if relevant.
4. If the search returned no relevant authority, or the relevant text was
   not extracted, say so explicitly: "I could not retrieve a sufficient
   authority for this question. Please escalate to qualified counsel."
   Do not infer from training-data knowledge.
5. Distinguish ratio from obiter where it matters. Note when an authority
   is first instance, appellate, or Supreme Court.
6. For legislation, default to the in-force version. State the date you used.
7. This output is informational retrieval, not legal advice."""

def call_tool(b):
    paths = {"uk_legal_search": "/api/search/uk-legal", "extract_urls": "/api/extract"}
    r = requests.post(
        f"https://www.apipick.com{paths[b.name]}",
        json=b.input,
        headers={"x-api-key": KEY},
        timeout=60,
    )
    return {"type": "tool_result", "tool_use_id": b.id,
            "content": r.text, "is_error": r.status_code != 200}

NEUTRAL_CITATION = re.compile(r"\[(\d{4})\]\s+(EWHC|EWCA|UKSC)\s+\d+")

def verify_citations(answer: str, extracted_text: str) -> list[str]:
    """Return list of citations in the answer that don't appear in extracted text."""
    cites = NEUTRAL_CITATION.findall(answer)
    return [
        f"[{year}] {court}" for year, court in cites
        if f"[{year}] {court}" not in extracted_text
    ]

def legal_research(question: str) -> str:
    msgs = [{"role": "user", "content": question}]
    extracted_buffer = ""

    while True:
        r = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            system=SYSTEM,
            tools=TOOLS,
            messages=msgs,
        )
        msgs.append({"role": "assistant", "content": r.content})

        if r.stop_reason == "end_turn":
            answer = "\n".join(b.text for b in r.content if b.type == "text")
            unverified = verify_citations(answer, extracted_buffer)
            if unverified:
                return (
                    "REFUSED: the answer cited authorities that were not retrieved. "
                    f"Unverified: {unverified}. Escalate to qualified counsel."
                )
            return answer

        if r.stop_reason == "tool_use":
            results = []
            for b in r.content:
                if b.type != "tool_use":
                    continue
                tr = call_tool(b)
                if b.name == "extract_urls":
                    extracted_buffer += tr["content"]
                results.append(tr)
            msgs.append({"role": "user", "content": results})

print(legal_research(
    "What is the test for breach of duty by a public authority "
    "post Roberts v Soldiers, Sailors, Airmen and Families Association?"
))

Three things this code does that lazy implementations don't: (1) it pulls every cited authority's full text via extract before relying on it, (2) it rejects answers whose neutral-citation strings don't actually appear in any retrieved text, (3) it bakes the 'I could not retrieve a sufficient authority' refusal into the system prompt so the model has a graceful exit when retrieval fails.

Cost ceiling

A typical legal-research call:

1 search call — 60 credits (~$0.06)
1 extract call covering 2–4 authorities — 4–8 credits (~$0.004–$0.008)
~6,000 input + 1,500 output Claude tokens — ~$0.05

Round figure: ~$0.12 per researched answer with citations. At 100 questions/day for a small in-house legal team that's $12/day — well below the cost of any commercial legal-tech subscription, and you control the audit trail.

Where this generalises

The 'citation grounding' architectural pattern doesn't only apply to UK case law — it's the same pattern that makes SEC filings RAG reliable for investment due diligence and that lets a scientific literature agent avoid hallucinated paper references. Three rules unify them:

Retrieve first, generate second. Never let the model produce a citation that wasn't in retrieved context.
Verify deterministically. A regex over extracted text is cheap and catches most fabrication.
Refuse gracefully. Train the system prompt to say "I cannot reliably answer" before training the model to be helpful.

Ayinde was the moment when these rules went from 'engineering best practice' to 'the only way to ship UK legal AI without putting your users in front of the BSB'. API Pick UK Legal Search is the search-layer building block; the rest is yours to wire up.

Frequently Asked Questions

What actually happened in Ayinde v Haringey?

In June 2025 the High Court (Ritchie J) found that counsel had filed a skeleton argument citing five non-existent cases generated by an LLM, plus another fabricated authority by a separate practitioner in Al-Haroun v Qatar National Bank. The judgments were handed down together as Ayinde v Haringey & Al-Haroun v Qatar National Bank [2025] EWHC 1383 (Admin). The court referred the practitioners to the Bar Standards Board and Solicitors Regulation Authority. The decision is now the standard authority cited whenever 'AI hallucination in legal practice' comes up in the UK.

Can I scrape BAILII?

No. BAILII's terms restrict bulk programmatic access, and the operator has been explicit on this — see the Transparency Project's documentation of BAILII's position. The legitimate route is the National Archives Find Case Law service for England & Wales judgments, which has a public API and a separate application-gated bulk-data feed.

What's the difference between Find Case Law and legislation.gov.uk?

Find Case Law (caselaw.nationalarchives.gov.uk) covers court judgments — the EWHC, EWCA, UKSC and tribunal decisions. legislation.gov.uk covers primary and secondary legislation: Acts of Parliament, statutory instruments, and (crucially for compliance work) point-in-time historical versions. A complete UK legal RAG stack needs both.

Why does point-in-time legislation matter for AI agents?

Most legal questions are about what the law was on a specific date — the day of an alleged breach, the date a contract was signed, the time an agency made a decision. Returning the current version of legislation gives wrong answers for any historical question. legislation.gov.uk's /data.feed?type=ukpga&prospective=true-style endpoints solve this. Most AI legal tools that hallucinate dates lose because they didn't wire this in.

Does this output count as legal advice?

No. Output from any retrieval API (including ours) is informational retrieval, not legal advice. Solicitors and barristers practising in England & Wales remain bound by SRA / BSB conduct rules. The architecture this post describes is designed to support — not replace — qualified legal review. The Ayinde precedent makes that distinction non-negotiable.

APIs used in this article

UK Legal Search

Semantic search across UK case law and primary legislation from one endpoint. Built for legal research, compliance review, statutory interpretation, and AI-driven legal-tech workflows.

URL Content Extract

Extract clean readable content from up to 25 URLs per call. Strips ads, nav, and boilerplate; returns markdown-flavoured text ready for LLM ingestion. 2 credits per URL.

Web Search

Real-time semantic web search built for LLM tool calling. Returns ranked titles, URLs, and clean snippets pre-shaped for agent consumption. Country and date filters supported.

Written by

Sarah Choy

CEO, API Pick

Sarah Choy is the CEO of API Pick. She writes about building production-ready APIs for AI agents and LLM workflows.