Citation-Grounded UK Case Law Retrieval After Ayinde v Haringey

Sarah ChoyPublished May 3, 202611 min read
Citation-Grounded UK Case Law Retrieval After Ayinde v Haringey

Ayinde v Haringey changed the calculus for UK legal AI overnight. A barrister cited five fake cases generated by an LLM and got referred to the Bar Standards Board. Hallucinated citations now have professional-conduct consequences. Here's the developer's guide to building citation-grounded UK case law retrieval — TNA Find Case Law, legislation.gov.uk, neutral-citation parsing, and how to wire it into a working agent.

TL;DR

  • Ayinde v Haringey [2025] EWHC 1383 (Admin) made hallucinated case citations a professional-conduct issue, not just a quality problem.
  • BAILII's terms restrict bulk programmatic access; the National Archives' Find Case Law service is the legitimate API for England & Wales judgments.
  • legislation.gov.uk has a stable Developer API with point-in-time access — essential for compliance work where 'in force on 30 June 2024' is a real query.
  • Stanford's HAI 'Hallucination-Free?' study showed Westlaw and Lexis+ AI still hallucinate 17–33% of the time — citation grounding is the architectural answer, not a vendor choice.
  • API Pick UK Legal Search wraps semantic retrieval over case law and primary legislation in one POST — 60 credits per call.

The case that changed the calculus

On 6 June 2025 the Divisional Court of the King's Bench Division handed down two judgments in a single ruling — Ayinde v The London Borough of Haringey and Al-Haroun v Qatar National Bank QPSC, neutral citation [2025] EWHC 1383 (Admin). In both, counsel had filed materials citing authorities that did not exist; in Ayinde, five fabricated cases. The court found a breach of professional duties and referred the practitioners to the BSB and SRA respectively.

Within 48 hours, every UK legal-tech roadmap had a new bullet at the top: citation grounding. Not as a quality nicety, as a regulatory necessity. The Stanford HAI 'Hallucination-Free?' study (Magesh et al., May 2024) had already shown that the leading vendor tools — Lexis+ AI, Westlaw AI — hallucinate citations 17–33% of the time on benchmarked queries. Ayinde turned that academic finding into a compliance problem.

For developers building AI features into UK legal products, the architectural lesson is clear: retrieval has to come from authoritative sources, citations have to be verifiable to the source, and the system has to refuse to answer when it can't ground a response. Anything else is a referral to a regulator waiting to happen.

Here's the working developer's guide to doing this properly: which APIs to use, what their gotchas are, and how to wire them into a working agent.

The authoritative sources

1. National Archives Find Case Law

Run by The National Archives (TNA). Covers court judgments handed down since April 2003 from the High Court, Court of Appeal, UK Supreme Court, Privy Council, and several tribunals. Documents in the Akoma Ntoso (LegalDocML) XML standard. Public API at caselaw.nationalarchives.gov.uk; bulk-data feed via gated application for "computational analysis" purposes.

Strengths: authoritative source for England & Wales case law, structured XML preserves judgment structure (paragraph numbers, citations, headings), Open Government Licence terms permit reuse with attribution. Weaknesses: limited coverage before 2003, application required for bulk feed, LegalDocML is heavy to parse if you've never touched legal XML.

2. legislation.gov.uk Developer API

Run by TNA. Covers UK primary legislation (Acts of Parliament), statutory instruments, and Northern Ireland / Scottish / Welsh equivalents. Critically, supports point-in-time queries: 'what did this provision look like on 1 April 2018'.

Strengths: stable, well-documented, point-in-time access, OGL-licensed. Weaknesses: schema (CLML) is its own thing; not all amendments tracked uniformly back through history; the in-force-date model can surprise you.

3. BAILII (read-only via human web)

Long-standing legal information aggregator. Critical for jurisdictions and document types Find Case Law doesn't cover (older judgments, Scottish / NI material, some tribunal decisions). Bulk scraping is restricted by BAILII's terms. Treat it as a human-readable backup, not an API.

Semantic search across UK case law and primary legislation in one POST endpoint. JSON in / JSON out, 60 credits per call, only-on-success billing. Returns title, neutral citation, source URL, and snippet ranked by semantic relevance. Designed for AI-agent tool calling.

Side by side

Sources at the time of writing. Confirm licensing terms with TNA before commercial deployment.
Find Case Lawlegislation.gov.ukAPI Pick UK Legal Search
CoverageE&W case law since 2003UK statutes + SIs, point-in-timeCase law + legislation, semantic
FormatAkoma Ntoso XMLCLML XML + JSON / AtomJSON, snippet pre-shaped
SearchKeywordKeyword + structuralSemantic
AuthNone for public; gated bulk feedNonex-api-key
LicenceOpen Government LicenceOpen Government LicenceAPI Pick TOS
FitCompliance-grade source-of-truthStatute lookup, point-in-time queriesAI-agent retrieval, RAG layer

The architecture that survives Ayinde

The minimum viable citation-grounded stack:

Question → [LLM agent]
              ↓ tool_use(uk_legal_search)
            [API Pick UK Legal Search] → ranked authorities
              ↓ tool_use(extract_urls)
            [API Pick URL Extract] → full judgment / statute body
              ↓ Agent reads, drafts answer
              ↓ Citation-required prompt rule
          Answer with [Neutral Citation, Section/Paragraph]
              ↓ Final verification pass
          Refuse if citation can't be matched in extracted text

The verification pass is where most teams fail. It's not enough to ask the model 'cite your sources' — you have to programmatically verify that every cited authority appears in the extracted text. If it doesn't, refuse the answer or surface it for human review.

Working code

import re, requests
from anthropic import Anthropic

KEY = "pk_yourkey"
client = Anthropic()

def fetch_tool(path: str) -> dict:
    return requests.get(f"https://www.apipick.com{path}/tool-schema").json()["claude"]

TOOLS = [
    fetch_tool("/api/search/uk-legal"),
    fetch_tool("/api/extract"),
]

SYSTEM = """You are a UK legal research assistant. You answer questions about
England & Wales case law and UK primary legislation using the tools available.

Rules — non-negotiable:

1. Use uk_legal_search to find authorities relevant to the question.
2. For authorities you intend to cite, use extract_urls to retrieve the
   full text. Do not cite anything you have not extracted.
3. Cite every legal proposition with a neutral citation in standard form:
   [YYYY] EWHC|EWCA|UKSC NNN (Court), §[paragraph] — for cases.
   Section N(M) of the [Act Name] YYYY — for statutes, with point-in-time
   noted if relevant.
4. If the search returned no relevant authority, or the relevant text was
   not extracted, say so explicitly: "I could not retrieve a sufficient
   authority for this question. Please escalate to qualified counsel."
   Do not infer from training-data knowledge.
5. Distinguish ratio from obiter where it matters. Note when an authority
   is first instance, appellate, or Supreme Court.
6. For legislation, default to the in-force version. State the date you used.
7. This output is informational retrieval, not legal advice."""

def call_tool(b):
    paths = {"uk_legal_search": "/api/search/uk-legal", "extract_urls": "/api/extract"}
    r = requests.post(
        f"https://www.apipick.com{paths[b.name]}",
        json=b.input,
        headers={"x-api-key": KEY},
        timeout=60,
    )
    return {"type": "tool_result", "tool_use_id": b.id,
            "content": r.text, "is_error": r.status_code != 200}

NEUTRAL_CITATION = re.compile(r"\[(\d{4})\]\s+(EWHC|EWCA|UKSC)\s+\d+")

def verify_citations(answer: str, extracted_text: str) -> list[str]:
    """Return list of citations in the answer that don't appear in extracted text."""
    cites = NEUTRAL_CITATION.findall(answer)
    return [
        f"[{year}] {court}" for year, court in cites
        if f"[{year}] {court}" not in extracted_text
    ]

def legal_research(question: str) -> str:
    msgs = [{"role": "user", "content": question}]
    extracted_buffer = ""

    while True:
        r = client.messages.create(
            model="claude-sonnet-4-6",
            max_tokens=4096,
            system=SYSTEM,
            tools=TOOLS,
            messages=msgs,
        )
        msgs.append({"role": "assistant", "content": r.content})

        if r.stop_reason == "end_turn":
            answer = "\n".join(b.text for b in r.content if b.type == "text")
            unverified = verify_citations(answer, extracted_buffer)
            if unverified:
                return (
                    "REFUSED: the answer cited authorities that were not retrieved. "
                    f"Unverified: {unverified}. Escalate to qualified counsel."
                )
            return answer

        if r.stop_reason == "tool_use":
            results = []
            for b in r.content:
                if b.type != "tool_use":
                    continue
                tr = call_tool(b)
                if b.name == "extract_urls":
                    extracted_buffer += tr["content"]
                results.append(tr)
            msgs.append({"role": "user", "content": results})

print(legal_research(
    "What is the test for breach of duty by a public authority "
    "post Roberts v Soldiers, Sailors, Airmen and Families Association?"
))

Three things this code does that lazy implementations don't: (1) it pulls every cited authority's full text via extract before relying on it, (2) it rejects answers whose neutral-citation strings don't actually appear in any retrieved text, (3) it bakes the 'I could not retrieve a sufficient authority' refusal into the system prompt so the model has a graceful exit when retrieval fails.

Cost ceiling

A typical legal-research call:

  • 1 search call — 60 credits (~$0.06)
  • 1 extract call covering 2–4 authorities — 4–8 credits (~$0.004–$0.008)
  • ~6,000 input + 1,500 output Claude tokens — ~$0.05

Round figure: ~$0.12 per researched answer with citations. At 100 questions/day for a small in-house legal team that's $12/day — well below the cost of any commercial legal-tech subscription, and you control the audit trail.

Where this generalises

The 'citation grounding' architectural pattern doesn't only apply to UK case law — it's the same pattern that makes SEC filings RAG reliable for investment due diligence and that lets a scientific literature agent avoid hallucinated paper references. Three rules unify them:

  • Retrieve first, generate second. Never let the model produce a citation that wasn't in retrieved context.
  • Verify deterministically. A regex over extracted text is cheap and catches most fabrication.
  • Refuse gracefully. Train the system prompt to say "I cannot reliably answer" before training the model to be helpful.

Ayinde was the moment when these rules went from 'engineering best practice' to 'the only way to ship UK legal AI without putting your users in front of the BSB'. API Pick UK Legal Search is the search-layer building block; the rest is yours to wire up.

Frequently Asked Questions

What actually happened in Ayinde v Haringey?

In June 2025 the High Court (Ritchie J) found that counsel had filed a skeleton argument citing five non-existent cases generated by an LLM, plus another fabricated authority by a separate practitioner in Al-Haroun v Qatar National Bank. The judgments were handed down together as Ayinde v Haringey & Al-Haroun v Qatar National Bank [2025] EWHC 1383 (Admin). The court referred the practitioners to the Bar Standards Board and Solicitors Regulation Authority. The decision is now the standard authority cited whenever 'AI hallucination in legal practice' comes up in the UK.

Can I scrape BAILII?

No. BAILII's terms restrict bulk programmatic access, and the operator has been explicit on this — see the Transparency Project's documentation of BAILII's position. The legitimate route is the National Archives Find Case Law service for England & Wales judgments, which has a public API and a separate application-gated bulk-data feed.

What's the difference between Find Case Law and legislation.gov.uk?

Find Case Law (caselaw.nationalarchives.gov.uk) covers court judgments — the EWHC, EWCA, UKSC and tribunal decisions. legislation.gov.uk covers primary and secondary legislation: Acts of Parliament, statutory instruments, and (crucially for compliance work) point-in-time historical versions. A complete UK legal RAG stack needs both.

Why does point-in-time legislation matter for AI agents?

Most legal questions are about what the law was on a specific date — the day of an alleged breach, the date a contract was signed, the time an agency made a decision. Returning the current version of legislation gives wrong answers for any historical question. legislation.gov.uk's /data.feed?type=ukpga&prospective=true-style endpoints solve this. Most AI legal tools that hallucinate dates lose because they didn't wire this in.

Does this output count as legal advice?

No. Output from any retrieval API (including ours) is informational retrieval, not legal advice. Solicitors and barristers practising in England & Wales remain bound by SRA / BSB conduct rules. The architecture this post describes is designed to support — not replace — qualified legal review. The Ayinde precedent makes that distinction non-negotiable.

APIs used in this article

Written by
Sarah Choy
CEO, API Pick

Sarah Choy is the CEO of API Pick. She writes about building production-ready APIs for AI agents and LLM workflows.