What the board recommended on 2026-06-03.
Frozen snapshot. A board of language models ran research and analysis tasks over the same daily data, then BurnBar reduced the result to deterministic selections with source citations. Benchmark signals are advisory — runtime constraints (provider-family mode, Exact Model Failover's canonical-ID gate, pinning, auth, quota, safety, availability) always win.
- generated 06:57 UTC
- 3 task categories
- 3 sources
Rundown · 2026-06-03
loading live data… Generated Wed, 03 Jun 2026 06:57:32 GMT · schema v1 · model board · runtime constraints win
- Artificial Analysis fresh
-
Terminal-Bench (via Hugging Face) fresh -
Design Arena unavailable
A daily board of language models runs research and analysis tasks across the benchmark feed, then BurnBar reduces their findings to this deterministic recommendation. Benchmark data is advisory only; user pins, auth, quota, safety, availability, and exact-model failover rules still win at runtime.
- Coding
Refactors, multi-file edits, repo-grounded code generation.
Today's pick: GPT-5.5 xhigh — stable favorite rank #1 under 2026-05-13.stable-favorites; preferred reasoning effort xhigh; led the benchmark composite at 55/100; evidence is fresh; context window of 400k clears typical large-context work; runner-up Claude Opus 4.7 is held in reserve for instant failover.
- #1
GPT-5.5 xhigh OpenAI · openai_compat99 selection / 100 evidence 59 · coverage 86%- bench55
- fresh100
- rel—
- latency44
- cost22
- ctx400k
- availunknown
- boardfavorite #1
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #1 receives a deterministic 12 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 55/100 across 6 sources.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 400k tokens.
- Wire-format family: openai_compat.
- #2
Claude Opus 4.7 Anthropic · anthropic96 selection / 100 evidence 58 · coverage 86%- bench53
- fresh100
- rel—
- latency44
- cost18
- ctx1M
- availunknown
- boardfavorite #2
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #2 receives a deterministic 8 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 53/100 across 2 sources.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
- #3
GLM 5.1 Z.ai · openai_compat93 selection / 100 evidence 53 · coverage 86%- bench40
- fresh100
- rel—
- latency59
- cost66
- ctx256k
- availunknown
- boardfavorite #3
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #3 receives a deterministic 5 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 40/100 across 2 sources.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 256k tokens.
- Wire-format family: openai_compat.
Source citations
Why other candidates didn't make the board pick 12 dropped
-
GPT-5.3 Codex · OpenAISelection policy did not clear the leader's margin for this task.
Evidence 57/100; selection 90/100 vs. leader 99/100.
-
GLM 5 · Z.aiSelection policy did not clear the leader's margin for this task.
Evidence 57/100; selection 87/100 vs. leader 99/100.
- DeepSeek V4 Pro · DeepSeek
Selection policy did not clear the leader's margin for this task.
Evidence 56/100; selection 84/100 vs. leader 99/100.
- Gemini 3.1 Pro (preview) · Google
Selection policy did not clear the leader's margin for this task.
Evidence 55/100; selection 81/100 vs. leader 99/100.
- Kimi K2.6 · Moonshot · Kimi
Selection policy did not clear the leader's margin for this task.
Evidence 55/100; selection 78/100 vs. leader 99/100.
-
MiniMax M2.7 · MiniMaxSelection policy did not clear the leader's margin for this task.
Evidence 55/100; selection 75/100 vs. leader 99/100.
-
Claude Sonnet 4.6 · AnthropicSelection policy did not clear the leader's margin for this task.
Evidence 53/100; selection 72/100 vs. leader 99/100.
-
GLM 4.7 · Z.aiSelection policy did not clear the leader's margin for this task.
Evidence 52/100; selection 69/100 vs. leader 99/100.
- DeepSeek V4 Flash · DeepSeek
Selection policy did not clear the leader's margin for this task.
Evidence 52/100; selection 66/100 vs. leader 99/100.
- Gemini 3 Flash · Google
Selection policy did not clear the leader's margin for this task.
Evidence 50/100; selection 63/100 vs. leader 99/100.
- Kimi K2.5 · Moonshot · Kimi
Selection policy did not clear the leader's margin for this task.
Evidence 49/100; selection 60/100 vs. leader 99/100.
-
GPT-5.4 mini · OpenAISelection policy did not clear the leader's margin for this task.
Evidence 47/100; selection 57/100 vs. leader 99/100.
-
- Terminal
Shell-loop agents that execute, observe, and self-correct.
Today's pick: GLM 5.1 — stable favorite rank #3 under 2026-05-13.stable-favorites; led the benchmark composite at 66/100; evidence is fresh; cost is competitive; context window of 256k clears typical large-context work; runner-up DeepSeek V4 Pro is held in reserve for instant failover.
- #1
GLM 5.1 Z.ai · openai_compat99 selection / 100 evidence 68 · coverage 86%- bench66
- fresh100
- rel65
- latency—
- cost66
- ctx256k
- availunknown
- boardfavorite #3
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #3 receives a deterministic 5 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 66/100 across 2 sources.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Context window: 256k tokens.
- Wire-format family: openai_compat.
Source citations
- #2DeepSeek V4 Pro DeepSeek · openai_compat96 selection / 100 evidence 70 · coverage 86%
- bench68
- fresh100
- rel65
- latency—
- cost72
- ctx128k
- availunknown
Board verdict
- Evidence score only; to outrank a protected favorite, a challenger must clear both evidence and benchmark dethroning margins across consecutive rundowns.
- Composite benchmark score 68/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Cost-efficient at typical blended pricing.
- Context window: 128k tokens.
- Wire-format family: openai_compat.
Source citations
- #3Kimi K2.6 Moonshot · Kimi · openai_compat93 selection / 100 evidence 68 · coverage 86%
- bench67
- fresh100
- rel65
- latency—
- cost60
- ctx262k
- availunknown
Board verdict
- Evidence score only; to outrank a protected favorite, a challenger must clear both evidence and benchmark dethroning margins across consecutive rundowns.
- Composite benchmark score 67/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Context window: 262k tokens.
- Wire-format family: openai_compat.
Source citations
Why other candidates didn't make the board pick 5 dropped
-
MiniMax M2.7 · MiniMaxSelection policy did not clear the leader's margin for this task.
Evidence 63/100; selection 90/100 vs. leader 99/100.
- DeepSeek V4 Flash · DeepSeek
Selection policy did not clear the leader's margin for this task.
Evidence 62/100; selection 87/100 vs. leader 99/100.
-
GLM 5 · Z.aiSelection policy did not clear the leader's margin for this task.
Evidence 61/100; selection 84/100 vs. leader 99/100.
- Kimi K2.5 · Moonshot · Kimi
Selection policy did not clear the leader's margin for this task.
Evidence 56/100; selection 81/100 vs. leader 99/100.
-
GLM 4.7 · Z.aiSelection policy did not clear the leader's margin for this task.
Evidence 51/100; selection 78/100 vs. leader 99/100.
-
- General
Mixed-intent chat / one-shot questions / catch-all routing.
Today's pick: GPT-5.5 xhigh — stable favorite rank #1 under 2026-05-13.stable-favorites; preferred reasoning effort xhigh; led the benchmark composite at 53/100; evidence is fresh; context window of 400k clears typical large-context work; runner-up Claude Opus 4.7 is held in reserve for instant failover.
- #1
GPT-5.5 xhigh OpenAI · openai_compat99 selection / 100 evidence 58 · coverage 86%- bench53
- fresh100
- rel—
- latency44
- cost22
- ctx400k
- availunknown
- boardfavorite #1
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #1 receives a deterministic 12 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 53/100 across 6 sources.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 400k tokens.
- Wire-format family: openai_compat.
- #2
Claude Opus 4.7 Anthropic · anthropic96 selection / 100 evidence 59 · coverage 86%- bench55
- fresh100
- rel—
- latency44
- cost18
- ctx1M
- availunknown
- boardfavorite #2
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #2 receives a deterministic 8 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 55/100 across 2 sources.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
- #3
GLM 5.1 Z.ai · openai_compat93 selection / 100 evidence 58 · coverage 86%- bench48
- fresh100
- rel—
- latency59
- cost66
- ctx256k
- availunknown
- boardfavorite #3
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #3 receives a deterministic 5 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 48/100 across 2 sources.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Latency is acceptable for non-interactive work.
- Context window: 256k tokens.
- Wire-format family: openai_compat.
Source citations
Why other candidates didn't make the board pick 12 dropped
-
GLM 5 · Z.aiSelection policy did not clear the leader's margin for this task.
Evidence 60/100; selection 90/100 vs. leader 99/100.
-
MiniMax M2.7 · MiniMaxSelection policy did not clear the leader's margin for this task.
Evidence 59/100; selection 87/100 vs. leader 99/100.
- Kimi K2.6 · Moonshot · Kimi
Selection policy did not clear the leader's margin for this task.
Evidence 58/100; selection 84/100 vs. leader 99/100.
- DeepSeek V4 Pro · DeepSeek
Selection policy did not clear the leader's margin for this task.
Evidence 58/100; selection 81/100 vs. leader 99/100.
-
GPT-5.3 Codex · OpenAISelection policy did not clear the leader's margin for this task.
Evidence 58/100; selection 78/100 vs. leader 99/100.
- Gemini 3.1 Pro (preview) · Google
Selection policy did not clear the leader's margin for this task.
Evidence 56/100; selection 75/100 vs. leader 99/100.
-
GLM 4.7 · Z.aiSelection policy did not clear the leader's margin for this task.
Evidence 56/100; selection 72/100 vs. leader 99/100.
- DeepSeek V4 Flash · DeepSeek
Selection policy did not clear the leader's margin for this task.
Evidence 55/100; selection 69/100 vs. leader 99/100.
- Kimi K2.5 · Moonshot · Kimi
Selection policy did not clear the leader's margin for this task.
Evidence 54/100; selection 66/100 vs. leader 99/100.
-
Claude Sonnet 4.6 · AnthropicSelection policy did not clear the leader's margin for this task.
Evidence 53/100; selection 63/100 vs. leader 99/100.
- Gemini 3 Flash · Google
Selection policy did not clear the leader's margin for this task.
Evidence 51/100; selection 60/100 vs. leader 99/100.
-
GPT-5.4 mini · OpenAISelection policy did not clear the leader's margin for this task.
Evidence 46/100; selection 57/100 vs. leader 99/100.
-
Re-run today's routing locally.
Add an account, pick a model, and let the Fire Hydrant do the routing. Provider-family mode by default; Exact Model Failover when you want cross-provider recovery without changing the canonical model.