What BurnBar recommended on 2026-05-12.
Frozen snapshot. Same data the router used to score requests that day, ordered by task and explained with source citations. Benchmark signals are advisory — runtime constraints (provider-family mode, pinning, auth, quota, safety, availability) always win.
- generated 12:00 UTC
- 5 task categories
- 5 sources
Rundown · 2026-05-12
loading live data… Generated Tue, 12 May 2026 12:00:00 GMT · schema v1 · benchmarks advisory · runtime constraints win
- Artificial Analysis unavailable
- Terminal-Bench (via Hugging Face) stale · 14h old
- Design Arena stale · 42h old
- Hugging Face fresh
- Manual OpenBurnBar fixture fresh
Benchmark data is advisory only. Provider-family mode, user pinning, account auth, quota state, safety policy, and availability are evaluated at runtime and override any ranking shown here.
- Coding
Refactors, multi-file edits, repo-grounded code generation.
Today's pick: Claude Opus 4.7 — led the benchmark composite at 87/100; evidence is the freshest available, even though older than ideal; context window of 1000k clears typical large-context work; runner-up Claude Sonnet 4.6 is held in reserve for instant failover.
- #1
Claude Opus 4.7 Anthropic · anthropic75 composite / 100 evidence 100%- bench87
- fresh55
- rel88
- latency46
- cost18
- ctx1M
- availcommon
Why this rank
- Composite benchmark score 87/100 across 1 source.
- Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
- #2
Claude Sonnet 4.6 Anthropic · anthropic67 composite / 100 evidence 86%- bench79
- fresh55
- rel86
- latency—
- cost42
- ctx1M
- availcommon
Why this rank
- Composite benchmark score 79/100 across 1 source.
- Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
- Tier · mid. Counted behind flagship siblings at equivalent benchmark; pin the tier explicitly to invert this.
Source citations
-
- Terminal
Shell-loop agents that execute, observe, and self-correct.
Today's pick: Claude Opus 4.7 — led the benchmark composite at 78/100; evidence is fresh; context window of 1000k clears typical large-context work; runner-up Claude Sonnet 4.6 is held in reserve for instant failover.
- #1
Claude Opus 4.7 Anthropic · anthropic76 composite / 100 evidence 86%- bench78
- fresh100
- rel88
- latency—
- cost18
- ctx1M
- availcommon
Why this rank
- Composite benchmark score 78/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
- #2
Claude Sonnet 4.6 Anthropic · anthropic72 composite / 100 evidence 86%- bench73
- fresh100
- rel86
- latency—
- cost42
- ctx1M
- availcommon
Why this rank
- Composite benchmark score 73/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
- Tier · mid. Counted behind flagship siblings at equivalent benchmark; pin the tier explicitly to invert this.
Source citations
-
- Design
Website / UI / SVG / slide generation evaluated head-to-head.
Today's pick: Claude Opus 4.7 — led the benchmark composite at 83/100; evidence is fresh; context window of 1000k clears typical large-context work.
- #1
Claude Opus 4.7 Anthropic · anthropic76 composite / 100 evidence 86%- bench83
- fresh85
- rel88
- latency—
- cost18
- ctx1M
- availcommon
Why this rank
- Composite benchmark score 83/100 across 1 source.
- Freshest evidence rated 85/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
-
- Analysis
Long-context reasoning, summarization, structured extraction.
Today's pick: Claude Opus 4.7 — led the benchmark composite at 89/100; evidence is the freshest available, even though older than ideal; context window of 1000k clears typical large-context work.
- #1
Claude Opus 4.7 Anthropic · anthropic74 composite / 100 evidence 86%- bench89
- fresh55
- rel88
- latency—
- cost18
- ctx1M
- availcommon
Why this rank
- Composite benchmark score 89/100 across 1 source.
- Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
-
- General
Mixed-intent chat / one-shot questions / catch-all routing.
Today's pick: Claude Opus 4.7 — led the benchmark composite at 87/100; evidence is the freshest available, even though older than ideal; context window of 1000k clears typical large-context work.
- #1
Claude Opus 4.7 Anthropic · anthropic73 composite / 100 evidence 86%- bench87
- fresh55
- rel88
- latency—
- cost18
- ctx1M
- availcommon
Why this rank
- Composite benchmark score 87/100 across 1 source.
- Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
-
Re-run today's routing locally.
Add an account, pick a model, and let the Fire Hydrant do the routing. Provider-family mode by default; intelligent mode opt-in.