OpenRouter Real Usage Rankings: Which Models Are Developers Abandoning?

AI Notes  ·  2026.06.16  ·  ~9 min read

OpenRouter model usage rankings and API pricing trend analysis

Bottom line first: in 2026, check OpenRouter real usage before you check benchmarks. Platform weekly tokens have crossed 28.9T, and the front of the chart is almost entirely low-cost MoE—DeepSeek V4 Flash and Hy3 Preview each exceed 10T. GPT-4o, xAI Grok, and other "default strongest" picks are leaving the main loop—not because they died, but because developers moved them to the review layer. Below: the Top 10 table, a five-model comparison, scenario picks, and the Mac setup that matches each tier.

1. 2026 OpenRouter Top 10 (weekly token volume)

Source: OpenRouter public model pages (mid-June 2026). Read rank through role—who runs the main loop, who only signs off.

Core ranking

Who is eating 80% of Agent traffic?

Default execution layer Review / upgrade tier Being replaced

# Model Weekly tokens Role Trend
1 DeepSeek V4 Flash 10.9T 2026 default pick ↑ #1
2 Hy3 Preview 10.7T Agent long chains ↑ tied for #1
3 Claude Opus 4.7 7.4T Pre-merge sign-off → review essential
4 Claude Sonnet 4.6 7.4T IDE upgrade default → medium complexity
5 Owl Alpha 5.0T Agent newcomer ↑ climbing fast
6 MiMo-V2-Flash 4.2T Open-source Flash
7 Kimi K2 3.8T CJK / multilingual docs
8 Gemini 3.5 Flash 3.2T Batch / multimodal
9 GPT-4o 1.6T Left main flow ↓ replaced by Flash
10 xAI Grok-3 1.1T Developers leaving ↓↓ -73%

At a glance: Top 2 combined ≈ 21.6T, or 75% of the platform's 28.9T—the default model string is already Flash, not Opus.

28.9T
OpenRouter weekly tokens
75%
Top 2 combined share
26×
Flash vs Sonnet per-task cost

The leaderboard does not tell you who is smartest—it tells you who is becoming the 2026 default model string.

How this differs from benchmarks

MMLU and SWE-bench measure ceilings; OpenRouter usage measures what teams dare to call every day. One Agent loop burns 50K–200K tokens—cheap + good enough wins traffic. See The OpenRouter Pricing Truth for the cost mechanics behind these numbers.

2. Top 5 primer: what each model is for

① DeepSeek V4 Flash — 2026 default execution layer

284B MoE, ~13B activated per pass; 1M context, input ~$0.10/M, cache hits as low as $0.04/M. Best for: reading repos, drafting patches, Agent main loops, RAG reranking. Will not run on a local Mac—OpenRouter API is the realistic path.

② Hy3 Preview — the long-chain Agent newcomer

Tencent's model, at 10.7T within weeks of launch on OpenRouter. Strong CJK understanding, multi-step tool calls, and stable long-context behavior. Best for: complex Agent orchestration, multilingual business docs, batch pipelines that complement Gemini. Also API-only; run the execution environment on Cloud Mac and keep inference on OpenRouter.

③ Claude Opus 4.7 — the sign-off layer

7.4T proves it is not dead—but the role changed. It no longer runs 80% of Agent loops. It handles pre-merge review, architecture decisions, and security audits. High unit cost, reserved for the ~5% of tasks where one failure is catastrophic.

④ Claude Sonnet 4.6 — IDE medium-complexity brain

Same token volume as Opus, different job: cross-module refactors, API contract changes, the "upgrade default" in Cursor and Claude Code. Roughly 30× more expensive than Flash, cheaper than Opus—the quality/cost middle tier.

⑤ Owl Alpha — Agent-focused explorer

A 5.0T newcomer with aggressive community feedback on multi-step coding and tool use. Good for Agent builders willing to experiment; in production, pair with DeepSeek Flash as a fallback safety net.

3. Five-model capability matrix

Not an IQ ranking—a "worth making default?" scorecard. Green = strongest on that axis; red = clear weakness.

Capability matrix

Coding and Agents are close—cost and multilingual work separate them

Capability DeepSeek Claude Gemini Kimi Hy3 Pick
Coding ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ DeepSeek / Hy3
Agent ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ Hy3 long chains
Long context ⭐⭐⭐⭐⭐ 1M ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐ DeepSeek
CJK / multilingual ⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ Kimi / Hy3
Cost ⭐⭐⭐⭐⭐ ⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐⭐ DeepSeek

Default stack: main loop DeepSeek / Hy3 + review Claude Sonnet. Claude is not unusable—it is not the default.

4. Price comparison: what one Agent task costs

Typical Agent task: 100K input + 10K output, 80% input cache-hit. The last column is multiple vs DeepSeek—the number that actually drives model choice.

Cost fault line

Sonnet is not slightly pricier—it is 26× more

Model Input /M Per task 500/day vs DeepSeek
Flash execution layer — safe as default
DeepSeek V4 Flash Baseline ~$0.10 $0.008 ~$4
Hy3 Preview ~$0.10 $0.009 ~$5 1.1×
Gemini 3.5 Flash ~$0.15 $0.02 ~$10 2.5×
Kimi K2 ~$0.15 $0.018 ~$9 2.3×
Review / premium tier — upgrade only, never default
Claude Sonnet 4.6 ~$3.00 $0.21 ~$105 26×
Claude Opus 4.7 ~$15.00 $1.05 ~$525 131×
GPT-4o Out of Top 8 ~$2.50 $0.18 ~$90 23×

500 Agent runs per day: DeepSeek $4 vs Sonnet $105. Quality gap is far smaller than 26×—that is why Flash owns the chart.

5. Which models are developers abandoning?

"Abandoning" means removed from the default slot—not unusable.

Model Status Where developers moved
GPT-4o Top 10 #9 · 1.6T Main loop → DeepSeek / Hy3; itself relegated to multimodal edge cases
xAI Grok-3 -73% WoW Agent loops too expensive; community momentum cooling fast
Claude Opus as default Still 7.4T absolute volume New calls are mostly "review"—no longer 80% of exploration loops
DeepSeek V3 / GPT-4 Turbo Off the chart Replaced in place by V4 Flash / newer MoE

Do not misread the chart

Claude family combined still exceeds 14T—Anthropic did not "lose." It retreated from volume tier to quality tier. What developers abandoned is "one model for everything," not Claude itself.

6. Pick by scenario

I use Cursor

Recommended stack:

  • Default Agent / multi-file edits → DeepSeek V4 Flash (OpenRouter or Cursor custom OpenAI-compatible endpoint)
  • Complex refactors, pre-merge review → Claude Sonnet 4.6
  • Inline completion → keep Cursor's built-in fast model—no need to switch

See Claude Code vs Cursor for entry-point differences: Cursor wins IDE flow; model tiering is on you to configure.

I use Claude Code

Recommended stack:

  • Main loop (read repo, run tests, fix diffs) → DeepSeek V4 Flash via OpenRouter
  • Architecture decisions, security changes, final merge review → Claude Opus 4.7 or Sonnet 4.6
  • CLAUDE.md rules → document when to upgrade, so exploration does not burn Opus every turn

Claude Code ships tied to Anthropic, but in 2026 more teams run external Flash brain + Claude review through OpenRouter as a dual-track setup.

I build Agents

Recommended stack:

  • Long-chain orchestration / multilingual docs → Hy3 Preview
  • General coding Agent main loop → DeepSeek V4 Flash
  • Batch processing, log classification, structured output → Gemini 3.5 Flash
  • Quality fallback → Claude Sonnet; upgrade to Opus after two consecutive failures

With a code knowledge graph, retrieval summaries go through Flash; final review through Claude—the token mass sits in the first bucket.

Scenario cheat sheet

Who you are → default model → when to upgrade

Who you are Default primary Upgrade model Never default
Cursor user DeepSeek V4 Flash Claude Sonnet 4.6 Opus for everyone
Claude Code user DeepSeek V4 Flash Claude Opus 4.7 Sonnet on main loop
Agent builder Hy3 + DeepSeek Gemini Flash Single model end-to-end
CJK / multilingual docs Kimi K2 + Hy3 Claude Sonnet GPT-4o

7. Mac setup: API or local?

Model picked—half the job remains: where inference runs, where the Agent executes.

Hardware match

284B MoE → API · 14B local · long-chain Agent → Cloud Mac

Model Inference Recommended Mac One-liner
DeepSeek V4 Flash OpenRouter API Any Mac Cannot run locally; Mac only runs git / tests
Hy3 Preview API Cloud Mac M4 24GB Long-chain Agents are memory-hungry → execute in cloud, infer via API
Qwen 14B / 7B Local Ollama Mac mini M4 24GB Data stays local; 7B ~35 tok/s
Claude Sonnet / Opus API Mac mini 16GB+ Inference in cloud; local runs Claude Code
CI Agent Flash API Cloud Mac + Runner xcodebuild by day, batch inference by night—same machine, offset schedules

Three rules: giant MoE → API; 7B–14B → Mac mini 24GB; Hy3 / CI Agent → Cloud Mac.

FAQ

Q: How often does OpenRouter Top 10 data update?
A: OpenRouter model pages show live usage charts; figures here are from mid-June 2026. Rank trends matter more than exact numbers—Flash owning the main loop is already structural.

Q: I only have a Claude subscription—can I still use DeepSeek?
A: Yes. Claude Code supports OpenRouter as a fallback endpoint; or run Cursor + OpenRouter for the main loop and Claude for review. The key is do not lock the main loop to Opus.

Q: Kimi or Hy3 for multilingual work?
A: Long CJK/multilingual documents and knowledge-base Q&A → Kimi. Multi-step coding Agents with dense tool calls → Hy3. Prices are close—trial both for a week and pick your default by task type.

Q: Is 16GB Mac mini enough?
A: Claude Code / Cursor + API inference only: yes. Local Ollama 14B + IDE + Agent in parallel: upgrade to 24GB or offload heavy work to Cloud Mac.

Conclusion: default model = traffic model, not strongest model

OpenRouter Top 10 is unambiguous: DeepSeek V4 Flash and Hy3 own the main loop, Claude holds the review layer, GPT-4o and xAI exit the default slot. When choosing, ask "can I retry this ten times without flinching?"—if yes, use Flash; if no, upgrade to Sonnet or Opus.

On hardware: giant MoE via API, small models via local Ollama, long-chain Agent execution on Cloud Mac. Pick the right model and your bill halves; pair the right Mac and your Agent stays stable.

ZavCloud

Hy3 + DeepSeek on API, Agent execution on Cloud Mac

Dedicated M4 24GB instances: run Claude Code, xcodebuild, and GitHub Runner while OpenRouter handles inference off-peak—the standard fix when local 16GB is not enough.

View Cloud Mac plans
Cloud MacRent Mac mini online