Fire Hydrant · the router

One local gateway. Two format pools. No cross-format hacks.

The Fire Hydrant — OpenBurnBar's routed client gateway at 127.0.0.1:8317 — receives OpenAI-shape requests on /v1/chat/completions from Cursor, Factory, Forge, OpenCode and Codex, and Anthropic-shape requests on /v1/messages from Claude Code. Each request is only ever served by accounts in the same format pool, then ranked by live quota health and least-recently-used. When an account hits its quota, rate-limits, or sees an auth failure, the router shifts traffic within the same pool — before your IDE notices. Routing is deterministic, auditable, and entirely on-device.

Download for macOS Read the gateway doc

Two pools, same-format failover

Two highways, never crossed.

Tool-call schemas, prompt-cache markers, and streaming-event types differ enough between OpenAI Chat Completions and Anthropic Messages that "translating" between them quietly breaks things — citations go missing, tool calls misfire, cache markers drop. The Hydrant refuses that game. A request on one endpoint can only be served by accounts in that pool. Same-format failover. Provable in tests.

OpenAI-family pool

/v1/chat/completions

OpenAI Chat Completions · SSE stream pass-through

Clients: Cursor (BYOK tunnel), Factory, Forge, OpenCode, Codex CLI in API-key mode.
Upstream accounts: OpenAI, Z.ai, MiniMax, Kimi, Ollama Cloud, Ollama Local.
Failover trigger: 429, 401, 402, 403, "quota" / "rate" / "exhaust" in response body.
Empty-pool behaviour: 503 with a structured message naming the missing pool.

Anthropic-family pool

/v1/messages

Anthropic Messages · sk-ant via x-api-key · Bearer for OAuth

Clients: Claude Code with ANTHROPIC_BASE_URL + ANTHROPIC_AUTH_TOKEN.
Upstream accounts: Anthropic Console (sk-ant-… admin key), Anthropic Pro/Team (OAuth bearer).
Auth-header dispatch: sk-ant-… → x-api-key; OAuth bearer → Authorization.
Anthropic version: pinned to 2023-06-01 on every request.

Source · OpenBurnBarDaemon/.../OpenBurnBarHTTPGatewayServer.swift · OpenBurnBarAnthropicProviderExecutor.swift

The board

What it looks like in practice.

The diagram below mirrors what the macOS app shows on the Router tab. Active routes glow ember, next-fallback candidates wait in amber, cooled-down accounts recover quietly, and exhausted accounts are visibly out of play until their cooldown elapses.

02 provider lanes

OpenAI-family /v1/chat/completions

Z.ai (GLM) healthy · active

Coding Plan · tokens healthy last used 0s ago

MiniMax pressure · next-fallback

Coding Plan · 32% remaining last used 14m ago

Kimi (Moonshot) healthy

Weekly tokens · 41% remaining last used 38m ago

OpenAI healthy

Org API · pay-as-you-go last used 1h ago

Ollama Cloud cooling-down · 3m 42s

API key · cooling after 5xx last used 2h ago

Ollama Local auth-failed · cooled

localhost:11434 · qwen3:7b last used 4d ago

Anthropic-family /v1/messages

Claude Pro (plan A) healthy · active

OAuth bearer · 5h window 62% last used 4s ago

Anthropic Console healthy · next-fallback

sk-ant-… · org admin key last used 1d ago

The honest scope

Routed today vs. tracked today.

Some providers are routed through the Hydrant — your request lands at OpenBurnBar's local gateway and gets forwarded. Others are tracked for cost and quota only — we read the dashboard, we don't proxy the inference. Both are useful. Only the routed ones benefit from automatic failover today.

Routed today — 7 upstreams

Inference goes through the Hydrant.

OpenAIAPI-key routing; OpenAI-family pool. Pay-as-you-go fallback for Codex.
Z.ai (GLM)Coding Plan keys; OpenAI-family pool. Token + MCP quota live.
MiniMaxCoding-plan keys (sk-cp-…); OpenAI-family pool. Per-model remaining quota.
Kimi (Moonshot)OpenAI-compatible endpoint; OpenAI-family pool. Weekly token window.
Ollama Cloud + LocalBoth routed in the OpenAI-family pool. Local cloud-suffix aliases rewritten before upstream.
Anthropic Console + Pro/TeamAnthropic-family pool. sk-ant-… via x-api-key, OAuth bearer via Authorization.

Source · catalog.json · ProviderAccountTypes.swift:403-599

Tracked-only today

We read the dashboard, not the wire.

GitHub CopilotPer-seat premium-interaction caps via GitHub's own API.
Cursor (plan usage)USD plan-usage from cursor.com. Inference flows through BYOK tunnel into the Hydrant.
WarpRequest credits + refresh windows from app.warp.dev GraphQL.
Aider · Forge cost · agent CLIsLocal analytics files only — no vendor quota to call against.

Source · docs/PROVIDERS.md · AgentLens/Services/ProviderQuota/

Codex CLI in ChatGPT-auth mode stays tracked-only. In OPENAI_BASE_URL / API-key mode it flows through the Hydrant like any OpenAI-family client.

The ranking policy

Six checks. One winner. One runner-up. Auditable.

When a request arrives, the router runs every candidate account through a filter, then ranks the survivors with a stable, deterministic comparator. The winner serves the request; the runner-up is held in reserve for instant failover.

#	Rule	What it does	Tiebreaker on equal
0	Format pool	Drop every candidate whose `formatFamily` doesn't match the inbound endpoint. `/v1/chat/completions` keeps only `openai_compat` accounts; `/v1/messages` keeps only `anthropic` accounts. If the pool is empty: 503 with a structured "missing pool" error.	—
F	Filter	Drop deleted, disabled, exhausted, rate-limited, cooling-down (until elapsed), auth-failed, model-incompatible, routing-disabled, missing-credential accounts.	—
1	Preferred pin	If you've pinned an account as preferred and it's healthy, it wins. Period.	Fall through to #2
2	Local credential	An account with a credential available locally outranks one whose secret would have to be fetched.	Fall through to #3
3	Quota health	healthy › pressure › unknown › cooling-down › rate-limited › exhausted › auth-failed › disabled › deleted.	Fall through to #4
4	Sort key	The user-controlled `sortKey` field — drag-to-reorder in the Mac app.	Fall through to #5
5	Least-recently-used	The account whose last request is oldest wins — round-robins evenly across equally-healthy peers.	Fall through to #6
6	Stable IDs	Provider ID, then account ID. Guarantees the same ranking from the same inputs — every time.	—

Implementation · ProviderRoutingPolicy.decide(...) in ProviderAccountTypes.swift:403–599

Account health · the lifecycle

Every signal an account can send. Every state it lands in.

healthy

Vendor quota API reports headroom. Recent requests have succeeded. Eligible to serve.
pressure

Less than 20% of the active quota bucket remains. Still eligible, but ranked below truly-healthy peers.
cooling-down

Recent transient failure, 401, or rate-limit. Parked for five minutes — then automatically retried.
rate-limited

Provider returned 429. Cool-down ticks while traffic shifts to the next healthy peer.
exhausted

Daily / weekly / monthly cap hit on a quota bucket. Held until the window rolls or you swap keys.
auth-failed

Token expired or revoked. Cool-down + UI flag so you re-auth at a moment that's convenient.
disabled

Routing turned off for this account by the user. Quota is still tracked; traffic is not sent.
unknown

Quota probe stale or never run. Eligible, but ranked below known-healthy peers.

Use cases

What the Hydrant does for you, today.

anthropic-family failover

Claude Pro caps out. The second Pro plan answers the next message.

Claude Code hits the 5-hour limit on your primary Pro plan. The hydrant marks that slot exhausted, parks it for the rolling window, and the next /v1/messages call lands on your second Pro plan — or your Anthropic Console key — all in the Anthropic pool.

· OpenBurnBarHTTPGatewayServer.swift · /v1/messages handler · testGatewayFailsOverAnthropicAccountOnQuotaExhausted

openai-family failover

Codex CLI in API-key mode. ChatGPT key dry? OpenAI then Z.ai picks up.

Set OPENAI_BASE_URL to the hydrant. Pin your ChatGPT API key as preferred for cost. When it runs out, the router shifts to the OpenAI pay-as-you-go account, and if that's also pressured, to your Z.ai plan. Same OpenAI shape end to end.

· ROUTED_CLIENT_GATEWAY.md · preferred-provider + LRU within OpenAI-family pool

round-robin

Three OpenAI-compatible vendors. LRU spreads the load.

Z.ai, MiniMax, and Kimi all speak OpenAI-shape JSON. When all three are healthy on a fungible model, the router round-robins by least-recently-used — fair share across your accounts instead of hammering one until it goes red.

· ProviderRoutingPolicy.decide · ranks by quota-health, then sortKey, then LRU

pool isolation

No cross-format hacks. Anthropic never gets an OpenAI prompt.

A Claude Code request hitting /v1/messages can only be served by accounts in the Anthropic pool. If you have no Anthropic accounts configured, the hydrant returns a structured 503 telling you exactly which pool you're missing — not a corrupted response, not a tool-call schema mismatch.

· ProviderAccountTypes.swift · formatFamily filter · testGatewayMessagesReturns503WhenOnlyOpenAICompatProvidersConfigured

Where keys live

Routing happens locally. So do your secrets.

Local-only secrets by default. Provider API keys live in the macOS Keychain with kSecAttrAccessibleWhenUnlockedThisDeviceOnly. The Hydrant fetches them on-demand for outbound requests; they never reach OpenBurnBar's cloud unless you opted into hosted quota refresh.
BYOK tunnel for Cursor. Cursor's BYOK endpoint speaks to a Cloudflare worker bound to your-handle.openburnbar.dev that forwards back into your Mac. Your keys stay on your Mac; the worker is a thin, authenticated pipe.
Bound to localhost. The gateway binds 127.0.0.1:8317 only. No LAN exposure, no public surface, no inbound from outside your Mac.
No plaintext in the relay. When Hermes Remote Relay is enabled, frames are end-to-end encrypted — the Cloud Run relay never sees the prompt or the response. The router stays on-device.
Every routing decision is logged. The chosen account, the skipped accounts, and the reason for each skip (exhausted, rate-limited, model-incompatible) land in a local ProviderRoutingDecisionEvent stream you can inspect.

Source · docs/ROUTED_CLIENT_GATEWAY.md · docs/THREAT_MODEL.md · ProviderRoutingStateBuilder.swift

Quick answers

The questions we get on every demo.

Why not translate between OpenAI and Anthropic formats?: Because the differences aren't superficial. Tool-call schemas, prompt-cache markers, citation blocks, streaming-event shapes, and the way assistant turns are encoded all differ between OpenAI Chat Completions and Anthropic Messages. A translator would have to be perfect on every release of either API — and any drift corrupts your inference in ways that are subtle and hard to debug. Same-format pass-through is a cleaner contract: your Claude request lands on a Claude account, your OpenAI-shape request lands on an OpenAI-shape account. Failover happens inside that contract, never across it.
Is this a load balancer?: No — it's a single-tenant router on your Mac. It does not split a request across providers or aggregate responses. One request, one provider, picked by the policy above.
What happens if every provider is exhausted?: The Hydrant returns a structured 503 to the client with the specific cooled-down accounts and the elapsed time until the earliest one recovers. Your IDE sees a clear error, not a hang.
Can I disable routing for an account but still track it?: Yes. Toggle routingEnabled off on the account; cost and quota continue to populate the dashboard while the Hydrant skips it for outbound traffic.
How long is the cool-down?: Five minutes for transient failures, rate-limits, and auth failures. Exhausted-bucket accounts stay parked until the bucket rolls (daily, weekly, or monthly, per provider).
Is there a config file?: Routing rules are derived from the accounts you've added in the macOS app and a per-account sortKey + preferred-pin you can drag-to-reorder. No YAML, no hot-reload — change the order in the UI, the new policy takes effect on the next request.

Build with two accounts. Sleep with five.

Add a second key, pin a preferred provider, and let the Fire Hydrant keep the inference flowing.

Download for macOS See the provider matrix