Operator log · daily intelligent router rundown

What BurnBar recommended on 2026-05-12.

Frozen snapshot. Same data the router used to score requests that day, ordered by task and explained with source citations. Benchmark signals are advisory — runtime constraints (provider-family mode, pinning, auth, quota, safety, availability) always win.

  • generated 12:00 UTC
  • 5 task categories
  • 5 sources
Daily Intelligent Router Rundown

Rundown · 2026-05-12

loading live data… Generated Tue, 12 May 2026 12:00:00 GMT · schema v1 · benchmarks advisory · runtime constraints win

  • Artificial Analysis unavailable
  • Terminal-Bench (via Hugging Face) stale · 14h old
  • Design Arena stale · 42h old
  • Hugging Face fresh
  • Manual OpenBurnBar fixture fresh

Benchmark data is advisory only. Provider-family mode, user pinning, account auth, quota state, safety policy, and availability are evaluated at runtime and override any ranking shown here.

  1. Coding

    Refactors, multi-file edits, repo-grounded code generation.

    Today's pick: Claude Opus 4.7 — led the benchmark composite at 87/100; evidence is the freshest available, even though older than ideal; context window of 1000k clears typical large-context work; runner-up Claude Sonnet 4.6 is held in reserve for instant failover.

    1. #1
      Claude Opus 4.7 Anthropic · anthropic
      75 composite / 100 evidence 100%
      • bench87
      • fresh55
      • rel88
      • latency46
      • cost18
      • ctx1M
      • availcommon

      Why this rank

      • Composite benchmark score 87/100 across 1 source.
      • Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
      • Premium-tier per-token cost.
      • Latency is acceptable for non-interactive work.
      • Context window: 1000k tokens.
      • Wire-format family: anthropic.

      Source citations

    2. #2
      Claude Sonnet 4.6 Anthropic · anthropic
      67 composite / 100 evidence 86%
      • bench79
      • fresh55
      • rel86
      • latency
      • cost42
      • ctx1M
      • availcommon

      Why this rank

      • Composite benchmark score 79/100 across 1 source.
      • Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
      • Mid-tier per-token cost.
      • Context window: 1000k tokens.
      • Wire-format family: anthropic.
      • Tier · mid. Counted behind flagship siblings at equivalent benchmark; pin the tier explicitly to invert this.

      Source citations

  2. Terminal

    Shell-loop agents that execute, observe, and self-correct.

    Today's pick: Claude Opus 4.7 — led the benchmark composite at 78/100; evidence is fresh; context window of 1000k clears typical large-context work; runner-up Claude Sonnet 4.6 is held in reserve for instant failover.

    1. #1
      Claude Opus 4.7 Anthropic · anthropic
      76 composite / 100 evidence 86%
      • bench78
      • fresh100
      • rel88
      • latency
      • cost18
      • ctx1M
      • availcommon

      Why this rank

      • Composite benchmark score 78/100 across 1 source.
      • Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
      • Premium-tier per-token cost.
      • Context window: 1000k tokens.
      • Wire-format family: anthropic.

      Source citations

    2. #2
      Claude Sonnet 4.6 Anthropic · anthropic
      72 composite / 100 evidence 86%
      • bench73
      • fresh100
      • rel86
      • latency
      • cost42
      • ctx1M
      • availcommon

      Why this rank

      • Composite benchmark score 73/100 across 1 source.
      • Freshest evidence rated 100/100 — older sources are weighted down, not dropped.
      • Mid-tier per-token cost.
      • Context window: 1000k tokens.
      • Wire-format family: anthropic.
      • Tier · mid. Counted behind flagship siblings at equivalent benchmark; pin the tier explicitly to invert this.

      Source citations

  3. Design

    Website / UI / SVG / slide generation evaluated head-to-head.

    Today's pick: Claude Opus 4.7 — led the benchmark composite at 83/100; evidence is fresh; context window of 1000k clears typical large-context work.

    1. #1
      Claude Opus 4.7 Anthropic · anthropic
      76 composite / 100 evidence 86%
      • bench83
      • fresh85
      • rel88
      • latency
      • cost18
      • ctx1M
      • availcommon

      Why this rank

      • Composite benchmark score 83/100 across 1 source.
      • Freshest evidence rated 85/100 — older sources are weighted down, not dropped.
      • Premium-tier per-token cost.
      • Context window: 1000k tokens.
      • Wire-format family: anthropic.

      Source citations

  4. Analysis

    Long-context reasoning, summarization, structured extraction.

    Today's pick: Claude Opus 4.7 — led the benchmark composite at 89/100; evidence is the freshest available, even though older than ideal; context window of 1000k clears typical large-context work.

    1. #1
      Claude Opus 4.7 Anthropic · anthropic
      74 composite / 100 evidence 86%
      • bench89
      • fresh55
      • rel88
      • latency
      • cost18
      • ctx1M
      • availcommon

      Why this rank

      • Composite benchmark score 89/100 across 1 source.
      • Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
      • Premium-tier per-token cost.
      • Context window: 1000k tokens.
      • Wire-format family: anthropic.

      Source citations

  5. General

    Mixed-intent chat / one-shot questions / catch-all routing.

    Today's pick: Claude Opus 4.7 — led the benchmark composite at 87/100; evidence is the freshest available, even though older than ideal; context window of 1000k clears typical large-context work.

    1. #1
      Claude Opus 4.7 Anthropic · anthropic
      73 composite / 100 evidence 86%
      • bench87
      • fresh55
      • rel88
      • latency
      • cost18
      • ctx1M
      • availcommon

      Why this rank

      • Composite benchmark score 87/100 across 1 source.
      • Freshest evidence rated 55/100 — older sources are weighted down, not dropped.
      • Premium-tier per-token cost.
      • Context window: 1000k tokens.
      • Wire-format family: anthropic.

      Source citations

What this rundown is — and isn't

  • Benchmark snapshots are advisory only — runtime constraints (provider-family mode, user pinning, auth, quota, safety, and availability) override any ranking shown here.
  • BurnBar does not fabricate benchmark numbers. Missing data is reported as 'not reported', never guessed.
  • Daily snapshots are sampled from public or documented sources; raw provider keys, cookies, and bearer tokens are never written into snapshots or this rundown.
  • One or more sources were unavailable for this day; the rundown reflects only the sources that responded.

Operator notes

  • Yesterday's cached fixture — used to demonstrate the archive renders multiple dates.

Re-run today's routing locally.

Add an account, pick a model, and let the Fire Hydrant do the routing. Provider-family mode by default; intelligent mode opt-in.