Why APIs Should Charge Only on Success — The Case for HTTP-200 Billing

Sarah ChoyPublished May 3, 20267 min read

Most APIs charge on every billable call. AI agents retry on flaky upstream services constantly — which means the legacy 'always charge' model effectively taxes resilience. Here's the case for billing only on HTTP 200, and what it changes about how you design agents.

TL;DR

•AI agents reasonably retry on transient failures (5xx, timeouts, rate limits). Under 'always charge' billing, every retry is a real charge — turning a 99.5%-uptime upstream into a 5× cost penalty during the bad 0.5%.
•'Charge only on HTTP 200' aligns the provider's incentive with the customer's: providers fix instability, customers don't pay for it.
•It also unlocks design patterns — aggressive retry budgets, optimistic prefetching, redundant calls — that 'always charge' makes prohibitively expensive.
•Most providers don't do this because their legacy infra ties metering to the upstream call, not the response. That's a billing-system limitation dressed up as policy.

The behavior we're trying to encourage

AI agents are a new kind of API client. They're not the careful enterprise integrations of 2015 that caught one error and emailed an on-call engineer. They are loops that, when something fails, sleep for a bit and try again — sometimes five or ten times — and don't surface the result to a human at all. The whole point of the loop is that transient failures should be invisible to the end user.

That's a great pattern. It produces resilient products. It hides the messy reality of the public internet from users. The trouble is the API billing model was set in an era when retrying was rare and expensive enough to discourage by default. It still works that way. Every retry is a billable call, regardless of whether it returned anything.

Concretely: an agent that hits a 99.5%-uptime API and retries up to five times on transient failures will, during the bad 0.5%, generate up to 5× the calls — and 5× the bill. The agent did the right thing. The customer pays for the upstream's instability.

The simpler alternative

Charge only on HTTP 200. Specifically, only on a successful response with the documented response shape. Everything else — 4xx for caller errors, 5xx for our errors, timeouts, rate limits, partial-but-malformed responses — is free.

This sounds like a small shift. It's not. It changes what's economically rational on both sides:

The provider now has a direct financial incentive to fix instability. Every flaky 503 is a missed sale, not a free lunch. We've found that adopting only-on-success billing was the single biggest force pulling our reliability work to the top of the roadmap — there's no spreadsheet trick that makes downtime look fine when you're forfeiting the revenue from those calls.
The customer can size retry budgets aggressively. A 5-retry exponential-backoff loop is a no-brainer when the worst case for transient failure is one paid call (the eventual success) instead of six (one paid for each attempt).
The market can compare APIs on dollars-per-successful-call rather than dollars-per-attempt. That's the right unit. If two APIs both publish '$0.001 per call' but one is at 99.9% uptime and the other at 95%, the user-visible costs differ by 5× under always-charge, and exactly 0% under only-on-success. Which one a buyer should prefer is obvious — and only-on-success makes it visible.

What changes about how you build agents

1. Retry budgets become free

Under always-charge billing, the right retry budget for a transient-failure handler is roughly 1–2 attempts — beyond that the cost of repeated failures starts to matter. Under only-on-success, the budget is 'as long as the user can wait'. We routinely see agent loops with 5-retry, exponential-backoff, jittered patterns that would be reckless under legacy billing.

2. Optimistic prefetching becomes cheap

A pattern that's prohibitively expensive under always-charge: kick off a search call speculatively while the user is still typing, in case the predicted query is right. If the prediction is wrong, throw the result away. Under only-on-success this is cheap because cancelled / unused calls return success but you discard the response — so you pay a small premium for predictability and latency. Under always-charge, every wrong prediction is a paid call, which makes the heuristic too expensive to ship.

3. Redundant calls become a reliability tool

Sometimes the cheapest reliability strategy is to fire two requests to two different APIs, take whichever returns first, and discard the slower one. Under always-charge that doubles your bill. Under only-on-success it's free for the discarded call (you didn't use the response, you didn't pay for it). Suddenly hedging strategies that were expensive abstractions become practical.

What 'success' has to mean

The model only works if 'success' is unambiguous. We define it tightly:

HTTP 200 with the documented response schema — billable.
HTTP 200 with an error wrapped in the response body — not billable (this would be the loophole that breaks the whole model).
HTTP 4xx — not billable. This includes 401 (auth), 402 (insufficient credits), 422 (validation), 404 (not found in some contexts).
HTTP 5xx — not billable.
HTTP 429 (rate limit) — not billable.
Network errors / timeouts before our edge — not billable.

For batch endpoints (e.g. URL Extract with 25 URLs in one call), the call as a whole is one HTTP 200 if any URLs succeeded, charged at the per-URL rate. The per-URL status field tells the caller which URLs failed. The work was done; it's billable.

Why most providers don't do this

Three reasons, in order of how much each one matters:

Billing infrastructure. The standard pattern is to meter at the API gateway, before the response is known. The gateway emits a billable event into Kafka, the billing system aggregates it, the bill is sent. Re-wiring metering to fire after the upstream service has answered with a status code requires either (a) coupling the gateway to backend health (creating new failure modes) or (b) writing a reconciliation pipeline that retroactively credits failed calls. Both are real engineering work.
Habit. The pattern was set by reliable backends — Twilio, AWS, Stripe. It works fine when your upstream is at 99.99% uptime. APIs that wrap LLMs, scrape the web, or call slow data partners are much less reliable, but the billing pattern was inherited from the reliable era and never updated.
Revenue. Always-charge is just more money. Most providers have not had to compete on this dimension because most customers have not been agent traffic, and synchronous human users tolerate occasional failures better than they tolerate reading a billing model.

The first one is real engineering. The second and third are inertia. Both will give way as agent traffic becomes the majority of API consumption — which the trend lines say is happening within 18 months for most public APIs.

What to look for as a buyer

If you're evaluating data APIs for an agent workload, three explicit questions:

'Do you charge on HTTP 200 only, or on every billable attempt?' — straight answer required. Don't accept 'usually' or 'depending on plan'.
'How is HTTP 207 / partial-success billed?' — flushes out the loophole answer.
'What's your published uptime, and what's the SLA refund mechanism if you miss it?' — only-on-success is a strong signal but it's not a substitute for an actual SLA. Both should be present.

The shift this points at

AI agents change the unit economics of every API they touch. APIs designed for the agent era look different — simpler shapes, predictable schemas, machine-friendly auth, transparent pricing — and their billing models look different too. Only-on-success is one of the simpler shifts. The harder one, still ahead of us, is figuring out how multi-step agent workflows price across providers when no single party owns the full path.

For now, demand only-on-success on the providers you can. Build agents that take advantage of it. And when you're shopping for an API, run the math on dollars-per-successful-call, not dollars-per-attempt.

Frequently Asked Questions

Doesn't 'only on success' incentivise providers to misclassify successes as failures?

In theory, but the inverse pressure is stronger: developers compare APIs on real-world dollars-per-useful-call, and that's exactly what only-on-success measures. A provider that quietly reclassified successes as failures would face an angry support queue and a public review trail within days. The market is small enough that this self-corrects.

What if my call partially succeeds — for example, batch extract where 2 of 25 URLs fail?

Two reasonable models. (a) The whole call is HTTP 200 if any URLs succeeded; per-URL failures are reported in the per-URL status. The work happened, so it's billable. (b) HTTP 207 (multi-status) for partial — each URL bills individually. We chose (a) because it makes pricing predictable: agents know that a successful HTTP 200 means the worst case is the announced credit cost. (b) is more 'fair' but harder to budget against.

What counts as a failure?

Our rule: HTTP 4xx that's clearly the caller's fault (malformed body, missing auth, insufficient credits) is not charged. HTTP 5xx is never charged — it's our problem. Network errors and timeouts on our side are never charged. Rate limits (429) are never charged. The only thing charged is HTTP 200 with the announced response shape.

Why do most providers charge on every call?

Three reasons, mostly historical. (1) Legacy billing systems meter at the API gateway, before the response is known. Wiring billing to fire on response status retroactively is non-trivial. (2) The pattern was set by Twilio and AWS in an era of reliable backends; APIs that wrap LLMs or scrape the web are much less reliable, so the assumption stops holding. (3) It's more revenue. The first two are excuses; the third is the truth.

Are there cases where 'always charge' makes sense?

When the call genuinely consumes a fixed cost on every invocation regardless of outcome — for example, a hardware-call-out (SMS sent), a mutation that's atomic at the boundary (DNS update), or a service whose backend can't distinguish 'we tried and failed' from 'we did it and you didn't notice'. For read-style data APIs and search APIs the model doesn't fit.

APIs used in this article

Web Search

Real-time semantic web search built for LLM tool calling. Returns ranked titles, URLs, and clean snippets pre-shaped for agent consumption. Country and date filters supported.

URL Content Extract

Extract clean readable content from up to 25 URLs per call. Strips ads, nav, and boilerplate; returns markdown-flavoured text ready for LLM ingestion. 2 credits per URL.

Email Validator

Validate email addresses with syntax check, MX record verification, and disposable email detection. Returns clear deliverability signals in one call.

Written by

Sarah Choy

CEO, API Pick

Sarah Choy is the CEO of API Pick. She writes about building production-ready APIs for AI agents and LLM workflows.