Operator log · daily intelligent router rundown

What BurnBar recommended on 2026-05-11.

Frozen snapshot. Same data the router used to score requests that day, ordered by task and explained with source citations. Benchmark signals are advisory — runtime constraints (provider-family mode, pinning, auth, quota, safety, availability) always win.

  • generated 12:00 UTC
  • 3 task categories
  • 5 sources
Daily Intelligent Router Rundown

Rundown · 2026-05-11

loading live data… Generated Mon, 11 May 2026 12:00:00 GMT · schema v1 · benchmarks advisory · runtime constraints win

  • Artificial Analysis unavailable
  • Terminal-Bench (via Hugging Face) error
  • Design Arena stale · 42h old
  • Hugging Face fresh
  • Manual OpenBurnBar fixture fresh

Benchmark data is advisory only. Provider-family mode, user pinning, account auth, quota state, safety policy, and availability are evaluated at runtime and override any ranking shown here.

  1. Coding

    Refactors, multi-file edits, repo-grounded code generation.

    Today's pick: Claude Opus 4.7 — led the benchmark composite at 85/100; evidence is fresh; context window of 1000k clears typical large-context work; runner-up GLM 5 is held in reserve for instant failover.

    1. #1
      Claude Opus 4.7 Anthropic · anthropic
      76 composite / 100 evidence 86%
      • bench85
      • fresh85
      • rel86
      • latency
      • cost18
      • ctx1M
      • availcommon

      Why this rank

      • Composite benchmark score 85/100 across 1 source.
      • Freshest evidence rated 85/100 — older sources are weighted down, not dropped.
      • Premium-tier per-token cost.
      • Context window: 1000k tokens.
      • Wire-format family: anthropic.

      Source citations

    2. #2
      GLM 5 Z.ai · openai_compat
      75 composite / 100 evidence 86%
      • bench78
      • fresh85
      • rel82
      • latency
      • cost66
      • ctx256k
      • availcommon

      Why this rank

      • Composite benchmark score 78/100 across 1 source.
      • Freshest evidence rated 85/100 — older sources are weighted down, not dropped.
      • Mid-tier per-token cost.
      • Context window: 256k tokens.
      • Wire-format family: openai_compat.

      Source citations

  2. Design

    Website / UI / SVG / slide generation evaluated head-to-head.

    Today's pick: Claude Opus 4.7 — led the benchmark composite at 81/100; evidence is fresh; context window of 1000k clears typical large-context work.

    1. #1
      Claude Opus 4.7 Anthropic · anthropic
      75 composite / 100 evidence 86%
      • bench81
      • fresh85
      • rel86
      • latency
      • cost18
      • ctx1M
      • availcommon

      Why this rank

      • Composite benchmark score 81/100 across 1 source.
      • Freshest evidence rated 85/100 — older sources are weighted down, not dropped.
      • Premium-tier per-token cost.
      • Context window: 1000k tokens.
      • Wire-format family: anthropic.

      Source citations

  3. Analysis

    Long-context reasoning, summarization, structured extraction.

    Today's pick: Claude Opus 4.7 — led the benchmark composite at 88/100; evidence is fresh; context window of 1000k clears typical large-context work.

    1. #1
      Claude Opus 4.7 Anthropic · anthropic
      78 composite / 100 evidence 86%
      • bench88
      • fresh85
      • rel86
      • latency
      • cost18
      • ctx1M
      • availcommon

      Why this rank

      • Composite benchmark score 88/100 across 1 source.
      • Freshest evidence rated 85/100 — older sources are weighted down, not dropped.
      • Premium-tier per-token cost.
      • Context window: 1000k tokens.
      • Wire-format family: anthropic.

      Source citations

What this rundown is — and isn't

  • Benchmark snapshots are advisory only — runtime constraints (provider-family mode, user pinning, auth, quota, safety, and availability) override any ranking shown here.
  • BurnBar does not fabricate benchmark numbers. Missing data is reported as 'not reported', never guessed.
  • Daily snapshots are sampled from public or documented sources; raw provider keys, cookies, and bearer tokens are never written into snapshots or this rundown.
  • One or more sources were unavailable for this day; the rundown reflects only the sources that responded.

Operator notes

  • Two days back — fewer fresh signals; this rundown intentionally shows what happens when Terminal-Bench is unavailable for a day.

Re-run today's routing locally.

Add an account, pick a model, and let the Fire Hydrant do the routing. Provider-family mode by default; intelligent mode opt-in.