What BurnBar recommended on 2026-05-13.
Frozen snapshot. Same data the router used to score requests that day, ordered by task and explained with source citations. Benchmark signals are advisory — runtime constraints (provider-family mode, pinning, auth, quota, safety, availability) always win.
- generated 12:00 UTC
- 5 task categories
- 5 sources
Rundown · 2026-05-13
loading live data… Generated Wed, 13 May 2026 12:00:00 GMT · schema v1 · benchmarks advisory · runtime constraints win
- Artificial Analysis unavailable
- Terminal-Bench (via Hugging Face) stale · 14h old
- Design Arena stale · 42h old
- Hugging Face fresh
- Manual OpenBurnBar fixture fresh
Benchmark data is advisory only. Provider-family mode, user pinning, account auth, quota state, safety policy, and availability are evaluated at runtime and override any ranking shown here.
- Coding
Refactors, multi-file edits, repo-grounded code generation.
Today's pick: GPT-5.5 Codex — led the benchmark composite at 89/100; evidence is the freshest available, even though older than ideal; context window of 400k clears typical large-context work; runner-up Claude Opus 4.7 is held in reserve for instant failover.
- #1
GPT-5.5 Codex OpenAI · openai_compat77 composite / 100 evidence 100%- bench89
- fresh55
- rel90
- latency56
- cost24
- ctx400k
- availcommon
Why this rank
- Composite benchmark score 89/100 across 1 source.
- Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 400k tokens.
- Wire-format family: openai_compat.
Source citations
- #2
Claude Opus 4.7 Anthropic · anthropic76 composite / 100 evidence 100%- bench88
- fresh55
- rel88
- latency46
- cost18
- ctx1M
- availcommon
Why this rank
- Composite benchmark score 88/100 across 1 source.
- Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
- #3
GLM 5 Z.ai · openai_compat75 composite / 100 evidence 100%- bench82
- fresh55
- rel84
- latency70
- cost66
- ctx256k
- availcommon
Why this rank
- Composite benchmark score 82/100 across 1 source.
- Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Latency profile is fast (high TPS, low TTFT).
- Context window: 256k tokens.
- Wire-format family: openai_compat.
Source citations
Why other candidates didn't make the cut 7 dropped
-
GPT-5.5 · OpenAIComposite score did not clear the leader's margin for this task.
Composite 75/100 vs. leader 77/100.
-
MiniMax 2.7 · MiniMaxComposite score did not clear the leader's margin for this task.
Composite 73/100 vs. leader 77/100.
-
Kimi 2.6 · Moonshot · KimiComposite score did not clear the leader's margin for this task.
Composite 72/100 vs. leader 77/100.
-
Claude Sonnet 4.6 · AnthropicComposite score did not clear the leader's margin for this task.
Composite 70/100 vs. leader 77/100.
-
GPT-5.5 mini · OpenAIComposite score did not clear the leader's margin for this task.
Composite 63/100 vs. leader 77/100.
-
Claude Haiku 4.5 · AnthropicComposite score did not clear the leader's margin for this task.
Composite 60/100 vs. leader 77/100.
- Gemini 3 Pro · Google
Not routable through a connected BurnBar provider account.
Composite 59/100 vs. leader 77/100.
-
- Terminal
Shell-loop agents that execute, observe, and self-correct.
Today's pick: GPT-5.5 Codex — led the benchmark composite at 86/100; evidence is fresh; context window of 400k clears typical large-context work; runner-up GPT-5.5 is held in reserve for instant failover.
- #1
GPT-5.5 Codex OpenAI · openai_compat84 composite / 100 evidence 100%- bench86
- fresh100
- rel90
- latency56
- cost24
- ctx400k
- availcommon
Why this rank
- Composite benchmark score 86/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 400k tokens.
- Wire-format family: openai_compat.
Source citations
- #2
GPT-5.5 OpenAI · openai_compat81 composite / 100 evidence 100%- bench82
- fresh100
- rel90
- latency58
- cost22
- ctx400k
- availcommon
Why this rank
- Composite benchmark score 82/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 400k tokens.
- Wire-format family: openai_compat.
Source citations
- #3
GLM 5 Z.ai · openai_compat81 composite / 100 evidence 100%- bench77
- fresh100
- rel84
- latency70
- cost66
- ctx256k
- availcommon
Why this rank
- Composite benchmark score 77/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Latency profile is fast (high TPS, low TTFT).
- Context window: 256k tokens.
- Wire-format family: openai_compat.
Source citations
Why other candidates didn't make the cut 6 dropped
-
MiniMax 2.7 · MiniMaxComposite score did not clear the leader's margin for this task.
Composite 79/100 vs. leader 84/100.
-
Claude Opus 4.7 · AnthropicPer-token cost is materially higher than the leader at comparable score.
Composite 79/100 vs. leader 84/100.
-
Kimi 2.6 · Moonshot · KimiComposite score did not clear the leader's margin for this task.
Composite 77/100 vs. leader 84/100.
-
Claude Sonnet 4.6 · AnthropicComposite score did not clear the leader's margin for this task.
Composite 74/100 vs. leader 84/100.
-
GPT-5.5 mini · OpenAIComposite score did not clear the leader's margin for this task.
Composite 67/100 vs. leader 84/100.
-
Claude Haiku 4.5 · AnthropicComposite score did not clear the leader's margin for this task.
Composite 64/100 vs. leader 84/100.
-
- Design
Website / UI / SVG / slide generation evaluated head-to-head.
Today's pick: Claude Opus 4.7 — led the benchmark composite at 84/100; evidence is fresh; context window of 1000k clears typical large-context work; runner-up GPT-5.5 is held in reserve for instant failover.
- #1
Claude Opus 4.7 Anthropic · anthropic78 composite / 100 evidence 100%- bench84
- fresh85
- rel84
- latency48
- cost18
- ctx1M
- availcommon
Why this rank
- Composite benchmark score 84/100 across 1 source.
- Freshest evidence rated 85/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
- #2
GPT-5.5 OpenAI · openai_compat77 composite / 100 evidence 100%- bench80
- fresh85
- rel88
- latency58
- cost22
- ctx400k
- availcommon
Why this rank
- Composite benchmark score 80/100 across 1 source.
- Freshest evidence rated 85/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 400k tokens.
- Wire-format family: openai_compat.
Source citations
- #3
GLM 5 Z.ai · openai_compat77 composite / 100 evidence 100%- bench75
- fresh85
- rel84
- latency70
- cost66
- ctx256k
- availcommon
Why this rank
- Composite benchmark score 75/100 across 1 source.
- Freshest evidence rated 85/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Latency profile is fast (high TPS, low TTFT).
- Context window: 256k tokens.
- Wire-format family: openai_compat.
Source citations
Why other candidates didn't make the cut 3 dropped
-
Kimi 2.6 · Moonshot · KimiComposite score did not clear the leader's margin for this task.
Composite 75/100 vs. leader 78/100.
-
Claude Sonnet 4.6 · AnthropicComposite score did not clear the leader's margin for this task.
Composite 74/100 vs. leader 78/100.
- Gemini 3 Pro · Google
Not routable through a connected BurnBar provider account.
Composite 63/100 vs. leader 78/100.
-
- Analysis
Long-context reasoning, summarization, structured extraction.
Today's pick: Claude Opus 4.7 — led the benchmark composite at 90/100; evidence is the freshest available, even though older than ideal; context window of 1000k clears typical large-context work; runner-up GPT-5.5 is held in reserve for instant failover.
- #1
Claude Opus 4.7 Anthropic · anthropic77 composite / 100 evidence 100%- bench90
- fresh55
- rel86
- latency48
- cost18
- ctx1M
- availcommon
Why this rank
- Composite benchmark score 90/100 across 1 source.
- Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
- #2
GPT-5.5 OpenAI · openai_compat76 composite / 100 evidence 100%- bench88
- fresh55
- rel88
- latency58
- cost22
- ctx400k
- availcommon
Why this rank
- Composite benchmark score 88/100 across 1 source.
- Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 400k tokens.
- Wire-format family: openai_compat.
Source citations
- #3
Claude Sonnet 4.6 Anthropic · anthropic72 composite / 100 evidence 100%- bench83
- fresh55
- rel86
- latency60
- cost42
- ctx1M
- availcommon
Why this rank
- Composite benchmark score 83/100 across 1 source.
- Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
- Tier · mid. Counted behind flagship siblings at equivalent benchmark; pin the tier explicitly to invert this.
Source citations
Why other candidates didn't make the cut 1 dropped
- Gemini 3 Pro · Google
Not routable through a connected BurnBar provider account.
Composite 61/100 vs. leader 77/100.
-
- General
Mixed-intent chat / one-shot questions / catch-all routing.
Today's pick: Claude Opus 4.7 — led the benchmark composite at 88/100; evidence is the freshest available, even though older than ideal; context window of 1000k clears typical large-context work; runner-up GPT-5.5 is held in reserve for instant failover.
- #1
Claude Opus 4.7 Anthropic · anthropic76 composite / 100 evidence 100%- bench88
- fresh55
- rel88
- latency48
- cost18
- ctx1M
- availcommon
Why this rank
- Composite benchmark score 88/100 across 1 source.
- Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
- #2
GPT-5.5 OpenAI · openai_compat76 composite / 100 evidence 100%- bench87
- fresh55
- rel88
- latency58
- cost22
- ctx400k
- availcommon
Why this rank
- Composite benchmark score 87/100 across 1 source.
- Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 400k tokens.
- Wire-format family: openai_compat.
Source citations
- #3
MiniMax 2.7 MiniMax · openai_compat74 composite / 100 evidence 100%- bench81
- fresh55
- rel82
- latency68
- cost62
- ctx320k
- availcommon
Why this rank
- Composite benchmark score 81/100 across 1 source.
- Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Latency profile is fast (high TPS, low TTFT).
- Context window: 320k tokens.
- Wire-format family: openai_compat.
Source citations
Why other candidates didn't make the cut 2 dropped
-
GPT-5.5 mini · OpenAIComposite score did not clear the leader's margin for this task.
Composite 64/100 vs. leader 76/100.
- Gemini 3 Pro · Google
Not routable through a connected BurnBar provider account.
Composite 61/100 vs. leader 76/100.
-
Re-run today's routing locally.
Add an account, pick a model, and let the Fire Hydrant do the routing. Provider-family mode by default; intelligent mode opt-in.