Operator log · daily intelligent router rundown

What BurnBar recommended on 2026-05-13.

Frozen snapshot. Same data the router used to score requests that day, ordered by task and explained with source citations. Benchmark signals are advisory — runtime constraints (provider-family mode, pinning, auth, quota, safety, availability) always win.

generated 12:00 UTC
5 task categories
5 sources

Daily Intelligent Router Rundown

Rundown · 2026-05-13

loading live data… Generated Wed, 13 May 2026 12:00:00 GMT · schema v1 · benchmarks advisory · runtime constraints win

Artificial Analysis unavailable
Terminal-Bench (via Hugging Face) stale · 14h old
Design Arena stale · 42h old
Hugging Face fresh
Manual OpenBurnBar fixture fresh

Benchmark data is advisory only. Provider-family mode, user pinning, account auth, quota state, safety policy, and availability are evaluated at runtime and override any ranking shown here.

Coding
Refactors, multi-file edits, repo-grounded code generation.

Today's pick: GPT-5.5 Codex — led the benchmark composite at 89/100; evidence is the freshest available, even though older than ideal; context window of 400k clears typical large-context work; runner-up Claude Opus 4.7 is held in reserve for instant failover.
1. #1
  GPT-5.5 Codex OpenAI · openai_compat
  
  77 composite / 100 evidence 100%
  - bench89
  - fresh55
  - rel90
  - latency56
  - cost24
  - ctx400k
  - availcommon
  Why this rank
  - Composite benchmark score 89/100 across 1 source.
  - Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Latency is acceptable for non-interactive work.
  - Context window: 400k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Artificial Analysis
    rank 1 score 89 5d old manual
2. #2
  Claude Opus 4.7 Anthropic · anthropic
  
  76 composite / 100 evidence 100%
  - bench88
  - fresh55
  - rel88
  - latency46
  - cost18
  - ctx1M
  - availcommon
  Why this rank
  - Composite benchmark score 88/100 across 1 source.
  - Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Latency is acceptable for non-interactive work.
  - Context window: 1000k tokens.
  - Wire-format family: anthropic.
  Source citations
  - Artificial Analysis
    rank 1 score 88 5d old manual
3. #3
  GLM 5 Z.ai · openai_compat
  
  75 composite / 100 evidence 100%
  - bench82
  - fresh55
  - rel84
  - latency70
  - cost66
  - ctx256k
  - availcommon
  Why this rank
  - Composite benchmark score 82/100 across 1 source.
  - Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
  - Mid-tier per-token cost.
  - Latency profile is fast (high TPS, low TTFT).
  - Context window: 256k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Artificial Analysis
    rank 4 score 82 5d old manual
Why other candidates didn't make the cut 7 dropped
- GPT-5.5 · OpenAI
  Composite score did not clear the leader's margin for this task.
  
  Composite 75/100 vs. leader 77/100.
- MiniMax 2.7 · MiniMax
  Composite score did not clear the leader's margin for this task.
  
  Composite 73/100 vs. leader 77/100.
- Kimi 2.6 · Moonshot · Kimi
  Composite score did not clear the leader's margin for this task.
  
  Composite 72/100 vs. leader 77/100.
- Claude Sonnet 4.6 · Anthropic
  Composite score did not clear the leader's margin for this task.
  
  Composite 70/100 vs. leader 77/100.
- GPT-5.5 mini · OpenAI
  Composite score did not clear the leader's margin for this task.
  
  Composite 63/100 vs. leader 77/100.
- Claude Haiku 4.5 · Anthropic
  Composite score did not clear the leader's margin for this task.
  
  Composite 60/100 vs. leader 77/100.
- Gemini 3 Pro · Google
  Not routable through a connected BurnBar provider account.
  
  Composite 59/100 vs. leader 77/100.
Terminal
Shell-loop agents that execute, observe, and self-correct.

Today's pick: GPT-5.5 Codex — led the benchmark composite at 86/100; evidence is fresh; context window of 400k clears typical large-context work; runner-up GPT-5.5 is held in reserve for instant failover.
1. #1
  GPT-5.5 Codex OpenAI · openai_compat
  
  84 composite / 100 evidence 100%
  - bench86
  - fresh100
  - rel90
  - latency56
  - cost24
  - ctx400k
  - availcommon
  Why this rank
  - Composite benchmark score 86/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Latency is acceptable for non-interactive work.
  - Context window: 400k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Terminal-Bench (via Hugging Face)
    rank 1 score 86 14h old manual
2. #2
  GPT-5.5 OpenAI · openai_compat
  
  81 composite / 100 evidence 100%
  - bench82
  - fresh100
  - rel90
  - latency58
  - cost22
  - ctx400k
  - availcommon
  Why this rank
  - Composite benchmark score 82/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Latency is acceptable for non-interactive work.
  - Context window: 400k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Terminal-Bench (via Hugging Face)
    rank 2 score 82 14h old manual
3. #3
  GLM 5 Z.ai · openai_compat
  
  81 composite / 100 evidence 100%
  - bench77
  - fresh100
  - rel84
  - latency70
  - cost66
  - ctx256k
  - availcommon
  Why this rank
  - Composite benchmark score 77/100 across 1 source.
  - Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
  - Mid-tier per-token cost.
  - Latency profile is fast (high TPS, low TTFT).
  - Context window: 256k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Terminal-Bench (via Hugging Face)
    rank 3 score 77 14h old manual
Why other candidates didn't make the cut 6 dropped
- MiniMax 2.7 · MiniMax
  Composite score did not clear the leader's margin for this task.
  
  Composite 79/100 vs. leader 84/100.
- Claude Opus 4.7 · Anthropic
  Per-token cost is materially higher than the leader at comparable score.
  
  Composite 79/100 vs. leader 84/100.
- Kimi 2.6 · Moonshot · Kimi
  Composite score did not clear the leader's margin for this task.
  
  Composite 77/100 vs. leader 84/100.
- Claude Sonnet 4.6 · Anthropic
  Composite score did not clear the leader's margin for this task.
  
  Composite 74/100 vs. leader 84/100.
- GPT-5.5 mini · OpenAI
  Composite score did not clear the leader's margin for this task.
  
  Composite 67/100 vs. leader 84/100.
- Claude Haiku 4.5 · Anthropic
  Composite score did not clear the leader's margin for this task.
  
  Composite 64/100 vs. leader 84/100.
Design
Website / UI / SVG / slide generation evaluated head-to-head.

Today's pick: Claude Opus 4.7 — led the benchmark composite at 84/100; evidence is fresh; context window of 1000k clears typical large-context work; runner-up GPT-5.5 is held in reserve for instant failover.
1. #1
  Claude Opus 4.7 Anthropic · anthropic
  
  78 composite / 100 evidence 100%
  - bench84
  - fresh85
  - rel84
  - latency48
  - cost18
  - ctx1M
  - availcommon
  Why this rank
  - Composite benchmark score 84/100 across 1 source.
  - Freshest evidence rated 85/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Latency is acceptable for non-interactive work.
  - Context window: 1000k tokens.
  - Wire-format family: anthropic.
  Source citations
  - Design Arena
    rank 1 score 84 42h old manual
2. #2
  GPT-5.5 OpenAI · openai_compat
  
  77 composite / 100 evidence 100%
  - bench80
  - fresh85
  - rel88
  - latency58
  - cost22
  - ctx400k
  - availcommon
  Why this rank
  - Composite benchmark score 80/100 across 1 source.
  - Freshest evidence rated 85/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Latency is acceptable for non-interactive work.
  - Context window: 400k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Design Arena
    rank 3 score 80 42h old manual
3. #3
  GLM 5 Z.ai · openai_compat
  
  77 composite / 100 evidence 100%
  - bench75
  - fresh85
  - rel84
  - latency70
  - cost66
  - ctx256k
  - availcommon
  Why this rank
  - Composite benchmark score 75/100 across 1 source.
  - Freshest evidence rated 85/100 — older sources are weighted down, not dropped.
  - Mid-tier per-token cost.
  - Latency profile is fast (high TPS, low TTFT).
  - Context window: 256k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Design Arena
    rank 5 score 75 42h old manual
Why other candidates didn't make the cut 3 dropped
- Kimi 2.6 · Moonshot · Kimi
  Composite score did not clear the leader's margin for this task.
  
  Composite 75/100 vs. leader 78/100.
- Claude Sonnet 4.6 · Anthropic
  Composite score did not clear the leader's margin for this task.
  
  Composite 74/100 vs. leader 78/100.
- Gemini 3 Pro · Google
  Not routable through a connected BurnBar provider account.
  
  Composite 63/100 vs. leader 78/100.
Analysis
Long-context reasoning, summarization, structured extraction.

Today's pick: Claude Opus 4.7 — led the benchmark composite at 90/100; evidence is the freshest available, even though older than ideal; context window of 1000k clears typical large-context work; runner-up GPT-5.5 is held in reserve for instant failover.
1. #1
  Claude Opus 4.7 Anthropic · anthropic
  
  77 composite / 100 evidence 100%
  - bench90
  - fresh55
  - rel86
  - latency48
  - cost18
  - ctx1M
  - availcommon
  Why this rank
  - Composite benchmark score 90/100 across 1 source.
  - Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Latency is acceptable for non-interactive work.
  - Context window: 1000k tokens.
  - Wire-format family: anthropic.
  Source citations
  - Artificial Analysis
    rank 1 score 90 5d old manual
2. #2
  GPT-5.5 OpenAI · openai_compat
  
  76 composite / 100 evidence 100%
  - bench88
  - fresh55
  - rel88
  - latency58
  - cost22
  - ctx400k
  - availcommon
  Why this rank
  - Composite benchmark score 88/100 across 1 source.
  - Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Latency is acceptable for non-interactive work.
  - Context window: 400k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Artificial Analysis
    rank 2 score 88 5d old manual
3. #3
  Claude Sonnet 4.6 Anthropic · anthropic
  
  72 composite / 100 evidence 100%
  - bench83
  - fresh55
  - rel86
  - latency60
  - cost42
  - ctx1M
  - availcommon
  Why this rank
  - Composite benchmark score 83/100 across 1 source.
  - Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
  - Mid-tier per-token cost.
  - Latency is acceptable for non-interactive work.
  - Context window: 1000k tokens.
  - Wire-format family: anthropic.
  - Tier · mid. Counted behind flagship siblings at equivalent benchmark; pin the tier explicitly to invert this.
  Source citations
  - Artificial Analysis
    rank 4 score 83 5d old manual
Why other candidates didn't make the cut 1 dropped
- Gemini 3 Pro · Google
  Not routable through a connected BurnBar provider account.
  
  Composite 61/100 vs. leader 77/100.
General
Mixed-intent chat / one-shot questions / catch-all routing.

Today's pick: Claude Opus 4.7 — led the benchmark composite at 88/100; evidence is the freshest available, even though older than ideal; context window of 1000k clears typical large-context work; runner-up GPT-5.5 is held in reserve for instant failover.
1. #1
  Claude Opus 4.7 Anthropic · anthropic
  
  76 composite / 100 evidence 100%
  - bench88
  - fresh55
  - rel88
  - latency48
  - cost18
  - ctx1M
  - availcommon
  Why this rank
  - Composite benchmark score 88/100 across 1 source.
  - Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Latency is acceptable for non-interactive work.
  - Context window: 1000k tokens.
  - Wire-format family: anthropic.
  Source citations
  - Artificial Analysis
    rank 1 score 88 5d old manual
2. #2
  GPT-5.5 OpenAI · openai_compat
  
  76 composite / 100 evidence 100%
  - bench87
  - fresh55
  - rel88
  - latency58
  - cost22
  - ctx400k
  - availcommon
  Why this rank
  - Composite benchmark score 87/100 across 1 source.
  - Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
  - Premium-tier per-token cost.
  - Latency is acceptable for non-interactive work.
  - Context window: 400k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Artificial Analysis
    rank 1 score 87 5d old manual
3. #3
  MiniMax 2.7 MiniMax · openai_compat
  
  74 composite / 100 evidence 100%
  - bench81
  - fresh55
  - rel82
  - latency68
  - cost62
  - ctx320k
  - availcommon
  Why this rank
  - Composite benchmark score 81/100 across 1 source.
  - Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
  - Mid-tier per-token cost.
  - Latency profile is fast (high TPS, low TTFT).
  - Context window: 320k tokens.
  - Wire-format family: openai_compat.
  Source citations
  - Artificial Analysis
    rank 5 score 81 5d old manual
Why other candidates didn't make the cut 2 dropped
- GPT-5.5 mini · OpenAI
  Composite score did not clear the leader's margin for this task.
  
  Composite 64/100 vs. leader 76/100.
- Gemini 3 Pro · Google
  Not routable through a connected BurnBar provider account.
  
  Composite 61/100 vs. leader 76/100.

Re-run today's routing locally.

Add an account, pick a model, and let the Fire Hydrant do the routing. Provider-family mode by default; intelligent mode opt-in.

Download for macOS Read the gateway doc

What BurnBar recommended on 2026-05-13.

Why this rank

Source citations

Why this rank

Source citations

Why this rank

Source citations

Why this rank

Source citations

Why this rank

Source citations

Why this rank

Source citations

Why this rank

Source citations

Why this rank

Source citations

Why this rank

Source citations

Why this rank

Source citations

Why this rank

Source citations

Why this rank

Source citations

Why this rank

Source citations

Why this rank

Source citations

Why this rank

Source citations

Re-run today's routing locally.