Model board · daily advisory rundown

What the board recommended on 2026-05-13.

Frozen snapshot. A board of language models ran research and analysis tasks over the same daily data, then BurnBar reduced the result to deterministic selections with source citations. Benchmark signals are advisory — runtime constraints (provider-family mode, Exact Model Failover's canonical-ID gate, pinning, auth, quota, safety, availability) always win.

generated 15:43 UTC
5 task categories
4 sources

Daily Model Board

Rundown · 2026-05-13

loading live data… Generated Wed, 13 May 2026 15:43:14 GMT · schema v1 · model board · runtime constraints win

Artificial Analysis error
Terminal-Bench (via Hugging Face) fresh
Design Arena unavailable
Manual OpenBurnBar fixture fresh

A daily board of language models runs research and analysis tasks across the benchmark feed, then BurnBar reduces their findings to this deterministic recommendation. Benchmark data is advisory only; user pins, auth, quota, safety, availability, and exact-model failover rules still win at runtime.

Coding
Refactors, multi-file edits, repo-grounded code generation.

Today's pick: GPT-5.5 xhigh — stable favorite rank #1 under 2026-05-13.stable-favorites; preferred reasoning effort xhigh; led the benchmark composite at 86/100; evidence is fresh; context window of 400k clears typical large-context work; runner-up Claude Opus 4.7 is held in reserve for instant failover.
1. #1
  GPT-5.5 xhigh OpenAI · openai_compat
  
  99 selection / 100 evidence 75 · coverage 71%
  - bench86
  - fresh100
  - rel—
  - latency—
  - cost22
  - ctx400k
  - availunknown
  - boardfavorite #1
  Board verdict
  - Stable favorite policy 2026-05-13.stable-favorites: favorite rank #1 receives a deterministic 12 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
  - Composite benchmark score 86/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Context window: 400k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Manual OpenBurnBar fixture
    rank 2 score 86 1m old manual
2. #2
  Claude Opus 4.7 Anthropic · anthropic
  
  96 selection / 100 evidence 76 · coverage 71%
  - bench88
  - fresh100
  - rel—
  - latency—
  - cost18
  - ctx1M
  - availunknown
  - boardfavorite #2
  Board verdict
  - Stable favorite policy 2026-05-13.stable-favorites: favorite rank #2 receives a deterministic 8 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
  - Composite benchmark score 88/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Context window: 1000k tokens.
  - Wire-format family: anthropic.
  Source citations
  - Manual OpenBurnBar fixture
    rank 1 score 88 1m old manual
3. #3
  GLM 5.1 Z.ai · openai_compat
  
  93 selection / 100 evidence 78 · coverage 71%
  - bench83
  - fresh100
  - rel—
  - latency—
  - cost66
  - ctx256k
  - availunknown
  - boardfavorite #3
  Board verdict
  - Stable favorite policy 2026-05-13.stable-favorites: favorite rank #3 receives a deterministic 5 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
  - Composite benchmark score 83/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Mid-tier per-token cost.
  - Context window: 256k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Manual OpenBurnBar fixture
    rank 3 score 83 1m old manual
Why other candidates didn't make the board pick 5 dropped
- GLM 5 · Z.ai
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 76/100; selection 90/100 vs. leader 99/100.
- Gemini 3.1 Pro (preview) · Google
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 75/100; selection 87/100 vs. leader 99/100.
- MiniMax M2.7 · MiniMax
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 74/100; selection 84/100 vs. leader 99/100.
- Kimi K2.6 · Moonshot · Kimi
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 72/100; selection 81/100 vs. leader 99/100.
- Claude Sonnet 4.6 · Anthropic
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 70/100; selection 78/100 vs. leader 99/100.
Terminal
Shell-loop agents that execute, observe, and self-correct.

Today's pick: GPT-5.5 xhigh — stable favorite rank #1 under 2026-05-13.stable-favorites; preferred reasoning effort xhigh; led the benchmark composite at 82/100; evidence is fresh; context window of 400k clears typical large-context work; runner-up Claude Opus 4.7 is held in reserve for instant failover.
1. #1
  GPT-5.5 xhigh OpenAI · openai_compat
  
  99 selection / 100 evidence 75 · coverage 71%
  - bench82
  - fresh100
  - rel—
  - latency—
  - cost22
  - ctx400k
  - availunknown
  - boardfavorite #1
  Board verdict
  - Stable favorite policy 2026-05-13.stable-favorites: favorite rank #1 receives a deterministic 12 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
  - Composite benchmark score 82/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Context window: 400k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Manual OpenBurnBar fixture
    rank 2 score 82 1m old manual
2. #2
  Claude Opus 4.7 Anthropic · anthropic
  
  96 selection / 100 evidence 73 · coverage 71%
  - bench79
  - fresh100
  - rel—
  - latency—
  - cost18
  - ctx1M
  - availunknown
  - boardfavorite #2
  Board verdict
  - Stable favorite policy 2026-05-13.stable-favorites: favorite rank #2 receives a deterministic 8 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
  - Composite benchmark score 79/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Context window: 1000k tokens.
  - Wire-format family: anthropic.
  Source citations
  - Manual OpenBurnBar fixture
    rank 2 score 79 1m old manual
3. #3
  GLM 5.1 Z.ai · openai_compat
  
  93 selection / 100 evidence 71 · coverage 86%
  - bench70
  - fresh100
  - rel65
  - latency—
  - cost66
  - ctx256k
  - availunknown
  - boardfavorite #3
  Board verdict
  - Stable favorite policy 2026-05-13.stable-favorites: favorite rank #3 receives a deterministic 5 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
  - Composite benchmark score 70/100 across 3 sources.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Mid-tier per-token cost.
  - Context window: 256k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Terminal-Bench (via Hugging Face)
    rank 1 score 69 1m old fresh
  - Terminal-Bench (via Hugging Face)
    rank 6 score 64 1m old fresh
  - Manual OpenBurnBar fixture
    rank 3 score 78 1m old manual
Why other candidates didn't make the board pick 8 dropped
- Kimi K2.6 · Moonshot · Kimi
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 70/100; selection 90/100 vs. leader 99/100.
- DeepSeek V4 Pro · DeepSeek
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 70/100; selection 87/100 vs. leader 99/100.
- MiniMax M2.7 · MiniMax
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 69/100; selection 84/100 vs. leader 99/100.
- Claude Sonnet 4.6 · Anthropic
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 68/100; selection 81/100 vs. leader 99/100.
- GLM 5 · Z.ai
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 68/100; selection 78/100 vs. leader 99/100.
- DeepSeek V4 Flash · DeepSeek
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 62/100; selection 75/100 vs. leader 99/100.
- Kimi K2.5 · Moonshot · Kimi
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 56/100; selection 72/100 vs. leader 99/100.
- GLM 4.7 · Z.ai
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 51/100; selection 69/100 vs. leader 99/100.
Design
Website / UI / SVG / slide generation evaluated head-to-head.

Today's pick: GPT-5.5 xhigh — stable favorite rank #1 under 2026-05-13.stable-favorites; preferred reasoning effort xhigh; led the benchmark composite at 80/100; evidence is fresh; context window of 400k clears typical large-context work; runner-up Claude Opus 4.7 is held in reserve for instant failover.
1. #1
  GPT-5.5 xhigh OpenAI · openai_compat
  
  99 selection / 100 evidence 73 · coverage 71%
  - bench80
  - fresh100
  - rel—
  - latency—
  - cost22
  - ctx400k
  - availunknown
  - boardfavorite #1
  Board verdict
  - Stable favorite policy 2026-05-13.stable-favorites: favorite rank #1 receives a deterministic 12 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
  - Composite benchmark score 80/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Context window: 400k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Manual OpenBurnBar fixture
    rank 3 score 80 1m old manual
2. #2
  Claude Opus 4.7 Anthropic · anthropic
  
  96 selection / 100 evidence 75 · coverage 71%
  - bench84
  - fresh100
  - rel—
  - latency—
  - cost18
  - ctx1M
  - availunknown
  - boardfavorite #2
  Board verdict
  - Stable favorite policy 2026-05-13.stable-favorites: favorite rank #2 receives a deterministic 8 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
  - Composite benchmark score 84/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Context window: 1000k tokens.
  - Wire-format family: anthropic.
  Source citations
  - Manual OpenBurnBar fixture
    rank 1 score 84 1m old manual
3. #3
  GLM 5.1 Z.ai · openai_compat
  
  93 selection / 100 evidence 73 · coverage 71%
  - bench76
  - fresh100
  - rel—
  - latency—
  - cost66
  - ctx256k
  - availunknown
  - boardfavorite #3
  Board verdict
  - Stable favorite policy 2026-05-13.stable-favorites: favorite rank #3 receives a deterministic 5 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
  - Composite benchmark score 76/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Mid-tier per-token cost.
  - Context window: 256k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Manual OpenBurnBar fixture
    rank 4 score 76 1m old manual
Why other candidates didn't make the board pick 4 dropped
- Gemini 3.1 Pro (preview) · Google
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 75/100; selection 90/100 vs. leader 99/100.
- GLM 5 · Z.ai
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 73/100; selection 87/100 vs. leader 99/100.
- Kimi K2.6 · Moonshot · Kimi
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 71/100; selection 84/100 vs. leader 99/100.
- Claude Sonnet 4.6 · Anthropic
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 70/100; selection 81/100 vs. leader 99/100.
Analysis
Long-context reasoning, summarization, structured extraction.

Today's pick: GPT-5.5 xhigh — stable favorite rank #1 under 2026-05-13.stable-favorites; preferred reasoning effort xhigh; led the benchmark composite at 88/100; evidence is fresh; context window of 400k clears typical large-context work; runner-up Claude Opus 4.7 is held in reserve for instant failover.
1. #1
  GPT-5.5 xhigh OpenAI · openai_compat
  
  99 selection / 100 evidence 77 · coverage 71%
  - bench88
  - fresh100
  - rel—
  - latency—
  - cost22
  - ctx400k
  - availunknown
  - boardfavorite #1
  Board verdict
  - Stable favorite policy 2026-05-13.stable-favorites: favorite rank #1 receives a deterministic 12 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
  - Composite benchmark score 88/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Context window: 400k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Manual OpenBurnBar fixture
    rank 2 score 88 1m old manual
2. #2
  Claude Opus 4.7 Anthropic · anthropic
  
  96 selection / 100 evidence 78 · coverage 71%
  - bench90
  - fresh100
  - rel—
  - latency—
  - cost18
  - ctx1M
  - availunknown
  - boardfavorite #2
  Board verdict
  - Stable favorite policy 2026-05-13.stable-favorites: favorite rank #2 receives a deterministic 8 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
  - Composite benchmark score 90/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Context window: 1000k tokens.
  - Wire-format family: anthropic.
  Source citations
  - Manual OpenBurnBar fixture
    rank 1 score 90 1m old manual
3. #3
  GLM 5.1 Z.ai · openai_compat
  
  93 selection / 100 evidence 79 · coverage 71%
  - bench86
  - fresh100
  - rel—
  - latency—
  - cost66
  - ctx256k
  - availunknown
  - boardfavorite #3
  Board verdict
  - Stable favorite policy 2026-05-13.stable-favorites: favorite rank #3 receives a deterministic 5 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
  - Composite benchmark score 86/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Mid-tier per-token cost.
  - Context window: 256k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Manual OpenBurnBar fixture
    rank 3 score 86 1m old manual
Why other candidates didn't make the board pick 2 dropped
- Gemini 3.1 Pro (preview) · Google
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 78/100; selection 90/100 vs. leader 99/100.
- Claude Sonnet 4.6 · Anthropic
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 72/100; selection 87/100 vs. leader 99/100.
General
Mixed-intent chat / one-shot questions / catch-all routing.

Today's pick: GPT-5.5 xhigh — stable favorite rank #1 under 2026-05-13.stable-favorites; preferred reasoning effort xhigh; led the benchmark composite at 87/100; evidence is fresh; context window of 400k clears typical large-context work; runner-up Claude Opus 4.7 is held in reserve for instant failover.
1. #1
  GPT-5.5 xhigh OpenAI · openai_compat
  
  99 selection / 100 evidence 76 · coverage 71%
  - bench87
  - fresh100
  - rel—
  - latency—
  - cost22
  - ctx400k
  - availunknown
  - boardfavorite #1
  Board verdict
  - Stable favorite policy 2026-05-13.stable-favorites: favorite rank #1 receives a deterministic 12 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
  - Composite benchmark score 87/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Context window: 400k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Manual OpenBurnBar fixture
    rank 1 score 87 1m old manual
2. #2
  Claude Opus 4.7 Anthropic · anthropic
  
  96 selection / 100 evidence 76 · coverage 71%
  - bench88
  - fresh100
  - rel—
  - latency—
  - cost18
  - ctx1M
  - availunknown
  - boardfavorite #2
  Board verdict
  - Stable favorite policy 2026-05-13.stable-favorites: favorite rank #2 receives a deterministic 8 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
  - Composite benchmark score 88/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Context window: 1000k tokens.
  - Wire-format family: anthropic.
  Source citations
  - Manual OpenBurnBar fixture
    rank 1 score 88 1m old manual
3. #3
  GLM 5.1 Z.ai · openai_compat
  
  93 selection / 100 evidence 78 · coverage 71%
  - bench84
  - fresh100
  - rel—
  - latency—
  - cost66
  - ctx256k
  - availunknown
  - boardfavorite #3
  Board verdict
  - Stable favorite policy 2026-05-13.stable-favorites: favorite rank #3 receives a deterministic 5 point prior until a challenger clears both dethroning margins on consecutive rundowns; the final selection score is calibrated after policy ordering so the public number matches the chosen rank.
  - Composite benchmark score 84/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Mid-tier per-token cost.
  - Context window: 256k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Manual OpenBurnBar fixture
    rank 3 score 84 1m old manual
Why other candidates didn't make the board pick 2 dropped
- Gemini 3.1 Pro (preview) · Google
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 77/100; selection 90/100 vs. leader 99/100.
- MiniMax M2.7 · MiniMax
  Selection policy did not clear the leader's margin for this task.
  
  Evidence 75/100; selection 87/100 vs. leader 99/100.

Re-run today's routing locally.

Add an account, pick a model, and let the Fire Hydrant do the routing. Provider-family mode by default; Exact Model Failover when you want cross-provider recovery without changing the canonical model.

Download for macOS Read the gateway doc

What the board recommended on 2026-05-13.

Board verdict

Source citations

Board verdict

Source citations

Board verdict

Source citations

Board verdict

Source citations

Board verdict

Source citations

Board verdict

Source citations

Board verdict

Source citations

Board verdict

Source citations

Board verdict

Source citations

Board verdict

Source citations

Board verdict

Source citations

Board verdict

Source citations

Board verdict

Source citations

Board verdict

Source citations

Board verdict

Source citations

Re-run today's routing locally.