Fire Hydrant · the router

One local gateway. Two format pools. No cross-format hacks.

The Fire Hydrant — OpenBurnBar's routed client gateway at 127.0.0.1:8317 — receives OpenAI-shape requests on /v1/chat/completions from Cursor, Factory, Forge, OpenCode and Codex, and Anthropic-shape requests on /v1/messages from Claude Code. Each request is only ever served by accounts in the same format pool, then ranked by live quota health and least-recently-used. When an account hits its quota, rate-limits, or sees an auth failure, the router shifts traffic within the same pool — before your IDE notices. Routing is deterministic, auditable, and entirely on-device.

Two pools, same-format failover

Two highways, never crossed.

Tool-call schemas, prompt-cache markers, and streaming-event types differ enough between OpenAI Chat Completions and Anthropic Messages that "translating" between them quietly breaks things — citations go missing, tool calls misfire, cache markers drop. The Hydrant refuses that game. A request on one endpoint can only be served by accounts in that pool. Same-format failover. Provable in tests.

OpenAI-family pool

/v1/chat/completions

OpenAI Chat Completions · SSE stream pass-through

  • Clients: Cursor (BYOK tunnel), Factory, Forge, OpenCode, Codex CLI in API-key mode.
  • Upstream accounts: OpenAI, Z.ai, MiniMax, Kimi, Ollama Cloud, Ollama Local.
  • Failover trigger: 429, 401, 402, 403, "quota" / "rate" / "exhaust" in response body.
  • Empty-pool behaviour: 503 with a structured message naming the missing pool.
Anthropic-family pool

/v1/messages

Anthropic Messages · sk-ant via x-api-key · Bearer for OAuth

  • Clients: Claude Code with ANTHROPIC_BASE_URL + ANTHROPIC_AUTH_TOKEN.
  • Upstream accounts: Anthropic Console (sk-ant-… admin key), Anthropic Pro/Team (OAuth bearer).
  • Auth-header dispatch: sk-ant-… → x-api-key; OAuth bearer → Authorization.
  • Anthropic version: pinned to 2023-06-01 on every request.

Source · OpenBurnBarDaemon/.../OpenBurnBarHTTPGatewayServer.swift · OpenBurnBarAnthropicProviderExecutor.swift

The board

What it looks like in practice.

The diagram below mirrors what the macOS app shows on the Router tab. Active routes glow ember, next-fallback candidates wait in amber, cooled-down accounts recover quietly, and exhausted accounts are visibly out of play until their cooldown elapses.

OpenBurnBar Fire Hydrant routing

Six client targets — Cursor, Factory, Forge, OpenCode, Codex CLI, and Claude Code — point at one local gateway running on 127.0.0.1:8317. The gateway exposes two independent routing pools: OpenAI-compatible served at /v1/chat/completions and Anthropic-compatible served at /v1/messages. A request to one pool can only be served by accounts in that pool — format families never cross. Inside each pool the router ranks healthy accounts by preferred pin, then local credential, then quota health, then a stable sort key, then least-recently-used. Six OpenAI-family provider lanes are shown (Z.ai active, MiniMax next-fallback, Kimi and OpenAI healthy, Ollama Cloud cooling-down, Ollama Local exhausted by auth failure) plus two Anthropic-family lanes (Claude Pro active, Anthropic Console next-fallback).

OpenAI-family rail Anthropic-family rail active route next-fallback cooling-down · 5 min exhausted · auth-failed
The honest scope

Routed today vs. tracked today.

Some providers are routed through the Hydrant — your request lands at OpenBurnBar's local gateway and gets forwarded. Others are tracked for cost and quota only — we read the dashboard, we don't proxy the inference. Both are useful. Only the routed ones benefit from automatic failover today.

Routed today — 7 upstreams

Inference goes through the Hydrant.

  • OpenAIAPI-key routing; OpenAI-family pool. Pay-as-you-go fallback for Codex.
  • Z.ai (GLM)Coding Plan keys; OpenAI-family pool. Token + MCP quota live.
  • MiniMaxCoding-plan keys (sk-cp-…); OpenAI-family pool. Per-model remaining quota.
  • Kimi (Moonshot)OpenAI-compatible endpoint; OpenAI-family pool. Weekly token window.
  • Ollama Cloud + LocalBoth routed in the OpenAI-family pool. Local cloud-suffix aliases rewritten before upstream.
  • Anthropic Console + Pro/TeamAnthropic-family pool. sk-ant-… via x-api-key, OAuth bearer via Authorization.

Source · catalog.json · ProviderAccountTypes.swift:403-599

Tracked-only today

We read the dashboard, not the wire.

  • GitHub CopilotPer-seat premium-interaction caps via GitHub's own API.
  • Cursor (plan usage)USD plan-usage from cursor.com. Inference flows through BYOK tunnel into the Hydrant.
  • WarpRequest credits + refresh windows from app.warp.dev GraphQL.
  • Aider · Forge cost · agent CLIsLocal analytics files only — no vendor quota to call against.

Source · docs/PROVIDERS.md · AgentLens/Services/ProviderQuota/

Codex CLI in ChatGPT-auth mode stays tracked-only. In OPENAI_BASE_URL / API-key mode it flows through the Hydrant like any OpenAI-family client.

The ranking policy

Six checks. One winner. One runner-up. Auditable.

When a request arrives, the router runs every candidate account through a filter, then ranks the survivors with a stable, deterministic comparator. The winner serves the request; the runner-up is held in reserve for instant failover.

# Rule What it does Tiebreaker on equal
0 Format pool Drop every candidate whose formatFamily doesn't match the inbound endpoint. /v1/chat/completions keeps only openai_compat accounts; /v1/messages keeps only anthropic accounts. If the pool is empty: 503 with a structured "missing pool" error.
F Filter Drop deleted, disabled, exhausted, rate-limited, cooling-down (until elapsed), auth-failed, model-incompatible, routing-disabled, missing-credential accounts.
1 Preferred pin If you've pinned an account as preferred and it's healthy, it wins. Period. Fall through to #2
2 Local credential An account with a credential available locally outranks one whose secret would have to be fetched. Fall through to #3
3 Quota health healthy › pressure › unknown › cooling-down › rate-limited › exhausted › auth-failed › disabled › deleted. Fall through to #4
4 Sort key The user-controlled sortKey field — drag-to-reorder in the Mac app. Fall through to #5
5 Least-recently-used The account whose last request is oldest wins — round-robins evenly across equally-healthy peers. Fall through to #6
6 Stable IDs Provider ID, then account ID. Guarantees the same ranking from the same inputs — every time.

Implementation · ProviderRoutingPolicy.decide(...) in ProviderAccountTypes.swift:403–599

Account health · the lifecycle

Every signal an account can send. Every state it lands in.

  • healthy

    Vendor quota API reports headroom. Recent requests have succeeded. Eligible to serve.

  • pressure

    Less than 20% of the active quota bucket remains. Still eligible, but ranked below truly-healthy peers.

  • cooling-down

    Recent transient failure, 401, or rate-limit. Parked for five minutes — then automatically retried.

  • rate-limited

    Provider returned 429. Cool-down ticks while traffic shifts to the next healthy peer.

  • exhausted

    Daily / weekly / monthly cap hit on a quota bucket. Held until the window rolls or you swap keys.

  • auth-failed

    Token expired or revoked. Cool-down + UI flag so you re-auth at a moment that's convenient.

  • disabled

    Routing turned off for this account by the user. Quota is still tracked; traffic is not sent.

  • unknown

    Quota probe stale or never run. Eligible, but ranked below known-healthy peers.

Use cases

What the Hydrant does for you, today.

anthropic-family failover

Claude Pro caps out. The second Pro plan answers the next message.

Claude Code hits the 5-hour limit on your primary Pro plan. The hydrant marks that slot exhausted, parks it for the rolling window, and the next /v1/messages call lands on your second Pro plan — or your Anthropic Console key — all in the Anthropic pool.

· OpenBurnBarHTTPGatewayServer.swift · /v1/messages handler · testGatewayFailsOverAnthropicAccountOnQuotaExhausted

openai-family failover

Codex CLI in API-key mode. ChatGPT key dry? OpenAI then Z.ai picks up.

Set OPENAI_BASE_URL to the hydrant. Pin your ChatGPT API key as preferred for cost. When it runs out, the router shifts to the OpenAI pay-as-you-go account, and if that's also pressured, to your Z.ai plan. Same OpenAI shape end to end.

· ROUTED_CLIENT_GATEWAY.md · preferred-provider + LRU within OpenAI-family pool

round-robin

Three OpenAI-compatible vendors. LRU spreads the load.

Z.ai, MiniMax, and Kimi all speak OpenAI-shape JSON. When all three are healthy on a fungible model, the router round-robins by least-recently-used — fair share across your accounts instead of hammering one until it goes red.

· ProviderRoutingPolicy.decide · ranks by quota-health, then sortKey, then LRU

pool isolation

No cross-format hacks. Anthropic never gets an OpenAI prompt.

A Claude Code request hitting /v1/messages can only be served by accounts in the Anthropic pool. If you have no Anthropic accounts configured, the hydrant returns a structured 503 telling you exactly which pool you're missing — not a corrupted response, not a tool-call schema mismatch.

· ProviderAccountTypes.swift · formatFamily filter · testGatewayMessagesReturns503WhenOnlyOpenAICompatProvidersConfigured

Where keys live

Routing happens locally. So do your secrets.

  • Local-only secrets by default. Provider API keys live in the macOS Keychain with kSecAttrAccessibleWhenUnlockedThisDeviceOnly. The Hydrant fetches them on-demand for outbound requests; they never reach OpenBurnBar's cloud unless you opted into hosted quota refresh.
  • BYOK tunnel for Cursor. Cursor's BYOK endpoint speaks to a Cloudflare worker bound to your-handle.openburnbar.dev that forwards back into your Mac. Your keys stay on your Mac; the worker is a thin, authenticated pipe.
  • Bound to localhost. The gateway binds 127.0.0.1:8317 only. No LAN exposure, no public surface, no inbound from outside your Mac.
  • No plaintext in the relay. When Hermes Remote Relay is enabled, frames are end-to-end encrypted — the Cloud Run relay never sees the prompt or the response. The router stays on-device.
  • Every routing decision is logged. The chosen account, the skipped accounts, and the reason for each skip (exhausted, rate-limited, model-incompatible) land in a local ProviderRoutingDecisionEvent stream you can inspect.

Source · docs/ROUTED_CLIENT_GATEWAY.md · docs/THREAT_MODEL.md · ProviderRoutingStateBuilder.swift

Quick answers

The questions we get on every demo.

Why not translate between OpenAI and Anthropic formats?
Because the differences aren't superficial. Tool-call schemas, prompt-cache markers, citation blocks, streaming-event shapes, and the way assistant turns are encoded all differ between OpenAI Chat Completions and Anthropic Messages. A translator would have to be perfect on every release of either API — and any drift corrupts your inference in ways that are subtle and hard to debug. Same-format pass-through is a cleaner contract: your Claude request lands on a Claude account, your OpenAI-shape request lands on an OpenAI-shape account. Failover happens inside that contract, never across it.
Is this a load balancer?
No — it's a single-tenant router on your Mac. It does not split a request across providers or aggregate responses. One request, one provider, picked by the policy above.
What happens if every provider is exhausted?
The Hydrant returns a structured 503 to the client with the specific cooled-down accounts and the elapsed time until the earliest one recovers. Your IDE sees a clear error, not a hang.
Can I disable routing for an account but still track it?
Yes. Toggle routingEnabled off on the account; cost and quota continue to populate the dashboard while the Hydrant skips it for outbound traffic.
How long is the cool-down?
Five minutes for transient failures, rate-limits, and auth failures. Exhausted-bucket accounts stay parked until the bucket rolls (daily, weekly, or monthly, per provider).
Is there a config file?
Routing rules are derived from the accounts you've added in the macOS app and a per-account sortKey + preferred-pin you can drag-to-reorder. No YAML, no hot-reload — change the order in the UI, the new policy takes effect on the next request.

Build with two accounts. Sleep with five.

Add a second key, pin a preferred provider, and let the Fire Hydrant keep the inference flowing.