What the board recommended on 2026-05-13.
Frozen snapshot. A board of language models ran research and analysis tasks over the same daily data, then BurnBar reduced the result to deterministic selections with source citations. Benchmark signals are advisory — runtime constraints (provider-family mode, Exact Model Failover's canonical-ID gate, pinning, auth, quota, safety, availability) always win.
- generated 15:43 UTC
- 5 task categories
- 4 sources
Rundown · 2026-05-13
loading live data… Generated Wed, 13 May 2026 15:43:14 GMT · schema v1 · model board · runtime constraints win
- Artificial Analysis error
-
Terminal-Bench (via Hugging Face) fresh -
Design Arena unavailable - Manual OpenBurnBar fixture fresh
A daily board of language models runs research and analysis tasks across the benchmark feed, then BurnBar reduces their findings to this deterministic recommendation. Benchmark data is advisory only; user pins, auth, quota, safety, availability, and exact-model failover rules still win at runtime.
- Coding
Refactors, multi-file edits, repo-grounded code generation.
Today's pick: GPT-5.5 xhigh — stable favorite rank #1 under 2026-05-13.stable-favorites; preferred reasoning effort xhigh; led the benchmark composite at 86/100; evidence is fresh; context window of 400k clears typical large-context work; runner-up Claude Opus 4.7 is held in reserve for instant failover.
- #1
GPT-5.5 xhigh OpenAI · openai_compat99 selection / 100 evidence 75 · coverage 71%- bench86
- fresh100
- rel—
- latency—
- cost22
- ctx400k
- availunknown
- boardfavorite #1
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #1 receives a deterministic 12 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 86/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Context window: 400k tokens.
- Wire-format family: openai_compat.
Source citations
- #2
Claude Opus 4.7 Anthropic · anthropic96 selection / 100 evidence 76 · coverage 71%- bench88
- fresh100
- rel—
- latency—
- cost18
- ctx1M
- availunknown
- boardfavorite #2
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #2 receives a deterministic 8 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 88/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
- #3
GLM 5.1 Z.ai · openai_compat93 selection / 100 evidence 78 · coverage 71%- bench83
- fresh100
- rel—
- latency—
- cost66
- ctx256k
- availunknown
- boardfavorite #3
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #3 receives a deterministic 5 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 83/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Context window: 256k tokens.
- Wire-format family: openai_compat.
Source citations
Why other candidates didn't make the board pick 5 dropped
-
GLM 5 · Z.aiSelection policy did not clear the leader's margin for this task.
Evidence 76/100; selection 90/100 vs. leader 99/100.
- Gemini 3.1 Pro (preview) · Google
Selection policy did not clear the leader's margin for this task.
Evidence 75/100; selection 87/100 vs. leader 99/100.
-
MiniMax M2.7 · MiniMaxSelection policy did not clear the leader's margin for this task.
Evidence 74/100; selection 84/100 vs. leader 99/100.
- Kimi K2.6 · Moonshot · Kimi
Selection policy did not clear the leader's margin for this task.
Evidence 72/100; selection 81/100 vs. leader 99/100.
-
Claude Sonnet 4.6 · AnthropicSelection policy did not clear the leader's margin for this task.
Evidence 70/100; selection 78/100 vs. leader 99/100.
-
- Terminal
Shell-loop agents that execute, observe, and self-correct.
Today's pick: GPT-5.5 xhigh — stable favorite rank #1 under 2026-05-13.stable-favorites; preferred reasoning effort xhigh; led the benchmark composite at 82/100; evidence is fresh; context window of 400k clears typical large-context work; runner-up Claude Opus 4.7 is held in reserve for instant failover.
- #1
GPT-5.5 xhigh OpenAI · openai_compat99 selection / 100 evidence 75 · coverage 71%- bench82
- fresh100
- rel—
- latency—
- cost22
- ctx400k
- availunknown
- boardfavorite #1
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #1 receives a deterministic 12 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 82/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Context window: 400k tokens.
- Wire-format family: openai_compat.
Source citations
- #2
Claude Opus 4.7 Anthropic · anthropic96 selection / 100 evidence 73 · coverage 71%- bench79
- fresh100
- rel—
- latency—
- cost18
- ctx1M
- availunknown
- boardfavorite #2
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #2 receives a deterministic 8 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 79/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
- #3
GLM 5.1 Z.ai · openai_compat93 selection / 100 evidence 71 · coverage 86%- bench70
- fresh100
- rel65
- latency—
- cost66
- ctx256k
- availunknown
- boardfavorite #3
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #3 receives a deterministic 5 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 70/100 across 3 sources.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Context window: 256k tokens.
- Wire-format family: openai_compat.
Why other candidates didn't make the board pick 8 dropped
- Kimi K2.6 · Moonshot · Kimi
Selection policy did not clear the leader's margin for this task.
Evidence 70/100; selection 90/100 vs. leader 99/100.
- DeepSeek V4 Pro · DeepSeek
Selection policy did not clear the leader's margin for this task.
Evidence 70/100; selection 87/100 vs. leader 99/100.
-
MiniMax M2.7 · MiniMaxSelection policy did not clear the leader's margin for this task.
Evidence 69/100; selection 84/100 vs. leader 99/100.
-
Claude Sonnet 4.6 · AnthropicSelection policy did not clear the leader's margin for this task.
Evidence 68/100; selection 81/100 vs. leader 99/100.
-
GLM 5 · Z.aiSelection policy did not clear the leader's margin for this task.
Evidence 68/100; selection 78/100 vs. leader 99/100.
- DeepSeek V4 Flash · DeepSeek
Selection policy did not clear the leader's margin for this task.
Evidence 62/100; selection 75/100 vs. leader 99/100.
- Kimi K2.5 · Moonshot · Kimi
Selection policy did not clear the leader's margin for this task.
Evidence 56/100; selection 72/100 vs. leader 99/100.
-
GLM 4.7 · Z.aiSelection policy did not clear the leader's margin for this task.
Evidence 51/100; selection 69/100 vs. leader 99/100.
-
- Design
Website / UI / SVG / slide generation evaluated head-to-head.
Today's pick: GPT-5.5 xhigh — stable favorite rank #1 under 2026-05-13.stable-favorites; preferred reasoning effort xhigh; led the benchmark composite at 80/100; evidence is fresh; context window of 400k clears typical large-context work; runner-up Claude Opus 4.7 is held in reserve for instant failover.
- #1
GPT-5.5 xhigh OpenAI · openai_compat99 selection / 100 evidence 73 · coverage 71%- bench80
- fresh100
- rel—
- latency—
- cost22
- ctx400k
- availunknown
- boardfavorite #1
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #1 receives a deterministic 12 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 80/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Context window: 400k tokens.
- Wire-format family: openai_compat.
Source citations
- #2
Claude Opus 4.7 Anthropic · anthropic96 selection / 100 evidence 75 · coverage 71%- bench84
- fresh100
- rel—
- latency—
- cost18
- ctx1M
- availunknown
- boardfavorite #2
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #2 receives a deterministic 8 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 84/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
- #3
GLM 5.1 Z.ai · openai_compat93 selection / 100 evidence 73 · coverage 71%- bench76
- fresh100
- rel—
- latency—
- cost66
- ctx256k
- availunknown
- boardfavorite #3
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #3 receives a deterministic 5 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 76/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Context window: 256k tokens.
- Wire-format family: openai_compat.
Source citations
Why other candidates didn't make the board pick 4 dropped
- Gemini 3.1 Pro (preview) · Google
Selection policy did not clear the leader's margin for this task.
Evidence 75/100; selection 90/100 vs. leader 99/100.
-
GLM 5 · Z.aiSelection policy did not clear the leader's margin for this task.
Evidence 73/100; selection 87/100 vs. leader 99/100.
- Kimi K2.6 · Moonshot · Kimi
Selection policy did not clear the leader's margin for this task.
Evidence 71/100; selection 84/100 vs. leader 99/100.
-
Claude Sonnet 4.6 · AnthropicSelection policy did not clear the leader's margin for this task.
Evidence 70/100; selection 81/100 vs. leader 99/100.
-
- Analysis
Long-context reasoning, summarization, structured extraction.
Today's pick: GPT-5.5 xhigh — stable favorite rank #1 under 2026-05-13.stable-favorites; preferred reasoning effort xhigh; led the benchmark composite at 88/100; evidence is fresh; context window of 400k clears typical large-context work; runner-up Claude Opus 4.7 is held in reserve for instant failover.
- #1
GPT-5.5 xhigh OpenAI · openai_compat99 selection / 100 evidence 77 · coverage 71%- bench88
- fresh100
- rel—
- latency—
- cost22
- ctx400k
- availunknown
- boardfavorite #1
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #1 receives a deterministic 12 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 88/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Context window: 400k tokens.
- Wire-format family: openai_compat.
Source citations
- #2
Claude Opus 4.7 Anthropic · anthropic96 selection / 100 evidence 78 · coverage 71%- bench90
- fresh100
- rel—
- latency—
- cost18
- ctx1M
- availunknown
- boardfavorite #2
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #2 receives a deterministic 8 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 90/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
- #3
GLM 5.1 Z.ai · openai_compat93 selection / 100 evidence 79 · coverage 71%- bench86
- fresh100
- rel—
- latency—
- cost66
- ctx256k
- availunknown
- boardfavorite #3
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #3 receives a deterministic 5 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 86/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Context window: 256k tokens.
- Wire-format family: openai_compat.
Source citations
Why other candidates didn't make the board pick 2 dropped
- Gemini 3.1 Pro (preview) · Google
Selection policy did not clear the leader's margin for this task.
Evidence 78/100; selection 90/100 vs. leader 99/100.
-
Claude Sonnet 4.6 · AnthropicSelection policy did not clear the leader's margin for this task.
Evidence 72/100; selection 87/100 vs. leader 99/100.
-
- General
Mixed-intent chat / one-shot questions / catch-all routing.
Today's pick: GPT-5.5 xhigh — stable favorite rank #1 under 2026-05-13.stable-favorites; preferred reasoning effort xhigh; led the benchmark composite at 87/100; evidence is fresh; context window of 400k clears typical large-context work; runner-up Claude Opus 4.7 is held in reserve for instant failover.
- #1
GPT-5.5 xhigh OpenAI · openai_compat99 selection / 100 evidence 76 · coverage 71%- bench87
- fresh100
- rel—
- latency—
- cost22
- ctx400k
- availunknown
- boardfavorite #1
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #1 receives a deterministic 12 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 87/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Context window: 400k tokens.
- Wire-format family: openai_compat.
Source citations
- #2
Claude Opus 4.7 Anthropic · anthropic96 selection / 100 evidence 76 · coverage 71%- bench88
- fresh100
- rel—
- latency—
- cost18
- ctx1M
- availunknown
- boardfavorite #2
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #2 receives a deterministic 8 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 88/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Premium-tier per-token cost.
- Context window: 1000k tokens.
- Wire-format family: anthropic.
Source citations
- #3
GLM 5.1 Z.ai · openai_compat93 selection / 100 evidence 78 · coverage 71%- bench84
- fresh100
- rel—
- latency—
- cost66
- ctx256k
- availunknown
- boardfavorite #3
Board verdict
- Stable favorite policy 2026-05-13.stable-favorites: favorite rank #3 receives a deterministic 5 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
- Composite benchmark score 84/100 across 1 source.
- Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
- Mid-tier per-token cost.
- Context window: 256k tokens.
- Wire-format family: openai_compat.
Source citations
Why other candidates didn't make the board pick 2 dropped
- Gemini 3.1 Pro (preview) · Google
Selection policy did not clear the leader's margin for this task.
Evidence 77/100; selection 90/100 vs. leader 99/100.
-
MiniMax M2.7 · MiniMaxSelection policy did not clear the leader's margin for this task.
Evidence 75/100; selection 87/100 vs. leader 99/100.
-
Re-run today's routing locally.
Add an account, pick a model, and let the Fire Hydrant do the routing. Provider-family mode by default; Exact Model Failover when you want cross-provider recovery without changing the canonical model.