What is the most-used model on OpenRouter?

As of mid-June 2026, DeepSeek V4 Flash leads with roughly 10.9T weekly tokens, followed by Tencent Hy3 Preview at 10.7T. Both are low-cost MoE models that together absorb about three-quarters of incremental platform traffic.

What should Cursor users default to?

Daily Agent and multi-file editing: DeepSeek V4 Flash. Pre-merge review or complex refactors: Claude Sonnet 4.6. Inline completion can stay on Cursor's built-in fast model.

Which models belong on a local Mac vs API?

Qwen 14B and other 7B–14B models fit Mac mini M4 24GB with Ollama. DeepSeek V4 Flash, Hy3, and other 200B+ MoE models need OpenRouter API. Long-chain Hy3 Agents pair best with Cloud Mac execution and API inference.

OpenRouter Real Usage Rankings: Which Models Are Developers Abandoning?

Q: Which models are developers abandoning?

GPT-4o, xAI Grok, and other expensive frontier models are losing main-loop token share—not because capability vanished, but because Agent-era developers moved the primary loop to the Flash tier and reserve Claude Opus/Sonnet for review and critical decisions.

Bottom line first: in 2026, check OpenRouter real usage before you check benchmarks. Platform weekly tokens have crossed 28.9T, and the front of the chart is almost entirely low-cost MoE—DeepSeek V4 Flash and Hy3 Preview each exceed 10T. GPT-4o, xAI Grok, and other "default strongest" picks are leaving the main loop—not because they died, but because developers moved them to the review layer. Below: the Top 10 table, a five-model comparison, scenario picks, and the Mac setup that matches each tier.

1. 2026 OpenRouter Top 10 (weekly token volume)

Source: OpenRouter public model pages (mid-June 2026). Read rank through role—who runs the main loop, who only signs off.

Core ranking

Who is eating 80% of Agent traffic?

Default execution layer Review / upgrade tier Being replaced

#	Model	Weekly tokens	Role	Trend
1	DeepSeek V4 Flash	10.9T	2026 default pick	↑ #1
2	Hy3 Preview	10.7T	Agent long chains	↑ tied for #1
3	Claude Opus 4.7	7.4T	Pre-merge sign-off	→ review essential
4	Claude Sonnet 4.6	7.4T	IDE upgrade default	→ medium complexity
5	Owl Alpha	5.0T	Agent newcomer	↑ climbing fast
6	MiMo-V2-Flash	4.2T	Open-source Flash	↑
7	Kimi K2	3.8T	CJK / multilingual docs	↑
8	Gemini 3.5 Flash	3.2T	Batch / multimodal	→
9	GPT-4o	1.6T	Left main flow	↓ replaced by Flash
10	xAI Grok-3	1.1T	Developers leaving	↓↓ -73%

At a glance: Top 2 combined ≈ 21.6T, or 75% of the platform's 28.9T—the default model string is already Flash, not Opus.

28.9T

OpenRouter weekly tokens

75%

Top 2 combined share

26×

Flash vs Sonnet per-task cost

The leaderboard does not tell you who is smartest—it tells you who is becoming the 2026 default model string.

How this differs from benchmarks

MMLU and SWE-bench measure ceilings; OpenRouter usage measures what teams dare to call every day. One Agent loop burns 50K–200K tokens—cheap + good enough wins traffic. See The OpenRouter Pricing Truth for the cost mechanics behind these numbers.

2. Top 5 primer: what each model is for

① DeepSeek V4 Flash — 2026 default execution layer

284B MoE, ~13B activated per pass; 1M context, input ~$0.10/M, cache hits as low as $0.04/M. Best for: reading repos, drafting patches, Agent main loops, RAG reranking. Will not run on a local Mac—OpenRouter API is the realistic path.

② Hy3 Preview — the long-chain Agent newcomer

Tencent's model, at 10.7T within weeks of launch on OpenRouter. Strong CJK understanding, multi-step tool calls, and stable long-context behavior. Best for: complex Agent orchestration, multilingual business docs, batch pipelines that complement Gemini. Also API-only; run the execution environment on Cloud Mac and keep inference on OpenRouter.

③ Claude Opus 4.7 — the sign-off layer

7.4T proves it is not dead—but the role changed. It no longer runs 80% of Agent loops. It handles pre-merge review, architecture decisions, and security audits. High unit cost, reserved for the ~5% of tasks where one failure is catastrophic.

④ Claude Sonnet 4.6 — IDE medium-complexity brain

Same token volume as Opus, different job: cross-module refactors, API contract changes, the "upgrade default" in Cursor and Claude Code. Roughly 30× more expensive than Flash, cheaper than Opus—the quality/cost middle tier.

⑤ Owl Alpha — Agent-focused explorer

A 5.0T newcomer with aggressive community feedback on multi-step coding and tool use. Good for Agent builders willing to experiment; in production, pair with DeepSeek Flash as a fallback safety net.

3. Five-model capability matrix

Not an IQ ranking—a "worth making default?" scorecard. Green = strongest on that axis; red = clear weakness.

Capability matrix

Coding and Agents are close—cost and multilingual work separate them

Capability	DeepSeek	Claude	Gemini	Kimi	Hy3	Pick
Coding	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	DeepSeek / Hy3
Agent	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	Hy3 long chains
Long context	⭐⭐⭐⭐⭐ 1M	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	DeepSeek
CJK / multilingual	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐	⭐⭐⭐⭐⭐	⭐⭐⭐⭐⭐	Kimi / Hy3
Cost	⭐⭐⭐⭐⭐	⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐	⭐⭐⭐⭐⭐	DeepSeek

Default stack: main loop DeepSeek / Hy3 + review Claude Sonnet. Claude is not unusable—it is not the default.

4. Price comparison: what one Agent task costs

Typical Agent task: 100K input + 10K output, 80% input cache-hit. The last column is multiple vs DeepSeek—the number that actually drives model choice.

Cost fault line

Sonnet is not slightly pricier—it is 26× more

Model	Input /M	Per task	500/day	vs DeepSeek
Flash execution layer — safe as default
DeepSeek V4 Flash Baseline	~$0.10	$0.008	~$4	1×
Hy3 Preview	~$0.10	$0.009	~$5	1.1×
Gemini 3.5 Flash	~$0.15	$0.02	~$10	2.5×
Kimi K2	~$0.15	$0.018	~$9	2.3×
Review / premium tier — upgrade only, never default
Claude Sonnet 4.6	~$3.00	$0.21	~$105	26×
Claude Opus 4.7	~$15.00	$1.05	~$525	131×
GPT-4o Out of Top 8	~$2.50	$0.18	~$90	23×

500 Agent runs per day: DeepSeek $4 vs Sonnet $105. Quality gap is far smaller than 26×—that is why Flash owns the chart.

5. Which models are developers abandoning?

"Abandoning" means removed from the default slot—not unusable.

Model	Status	Where developers moved
GPT-4o	Top 10 #9 · 1.6T	Main loop → DeepSeek / Hy3; itself relegated to multimodal edge cases
xAI Grok-3	-73% WoW	Agent loops too expensive; community momentum cooling fast
Claude Opus as default	Still 7.4T absolute volume	New calls are mostly "review"—no longer 80% of exploration loops
DeepSeek V3 / GPT-4 Turbo	Off the chart	Replaced in place by V4 Flash / newer MoE

Do not misread the chart

Claude family combined still exceeds 14T—Anthropic did not "lose." It retreated from volume tier to quality tier. What developers abandoned is "one model for everything," not Claude itself.

6. Pick by scenario

I use Cursor

Recommended stack:

Default Agent / multi-file edits → DeepSeek V4 Flash (OpenRouter or Cursor custom OpenAI-compatible endpoint)
Complex refactors, pre-merge review → Claude Sonnet 4.6
Inline completion → keep Cursor's built-in fast model—no need to switch

See Claude Code vs Cursor for entry-point differences: Cursor wins IDE flow; model tiering is on you to configure.

I use Claude Code

Recommended stack:

Main loop (read repo, run tests, fix diffs) → DeepSeek V4 Flash via OpenRouter
Architecture decisions, security changes, final merge review → Claude Opus 4.7 or Sonnet 4.6
CLAUDE.md rules → document when to upgrade, so exploration does not burn Opus every turn

Claude Code ships tied to Anthropic, but in 2026 more teams run external Flash brain + Claude review through OpenRouter as a dual-track setup.

I build Agents

Recommended stack:

Long-chain orchestration / multilingual docs → Hy3 Preview
General coding Agent main loop → DeepSeek V4 Flash
Batch processing, log classification, structured output → Gemini 3.5 Flash
Quality fallback → Claude Sonnet; upgrade to Opus after two consecutive failures

With a code knowledge graph, retrieval summaries go through Flash; final review through Claude—the token mass sits in the first bucket.

Scenario cheat sheet

Who you are → default model → when to upgrade

Who you are	Default primary	Upgrade model	Never default
Cursor user	DeepSeek V4 Flash	Claude Sonnet 4.6	Opus for everyone
Claude Code user	DeepSeek V4 Flash	Claude Opus 4.7	Sonnet on main loop
Agent builder	Hy3 + DeepSeek	Gemini Flash	Single model end-to-end
CJK / multilingual docs	Kimi K2 + Hy3	Claude Sonnet	GPT-4o

7. Mac setup: API or local?

Model picked—half the job remains: where inference runs, where the Agent executes.

Hardware match

284B MoE → API · 14B local · long-chain Agent → Cloud Mac

Model	Inference	Recommended Mac	One-liner
DeepSeek V4 Flash	OpenRouter API	Any Mac	Cannot run locally; Mac only runs git / tests
Hy3 Preview	API	Cloud Mac M4 24GB	Long-chain Agents are memory-hungry → execute in cloud, infer via API
Qwen 14B / 7B	Local Ollama	Mac mini M4 24GB	Data stays local; 7B ~35 tok/s
Claude Sonnet / Opus	API	Mac mini 16GB+	Inference in cloud; local runs Claude Code
CI Agent	Flash API	Cloud Mac + Runner	`xcodebuild` by day, batch inference by night—same machine, offset schedules

Three rules: giant MoE → API; 7B–14B → Mac mini 24GB; Hy3 / CI Agent → Cloud Mac.

FAQ

Q: How often does OpenRouter Top 10 data update?
A: OpenRouter model pages show live usage charts; figures here are from mid-June 2026. Rank trends matter more than exact numbers—Flash owning the main loop is already structural.

Q: I only have a Claude subscription—can I still use DeepSeek?
A: Yes. Claude Code supports OpenRouter as a fallback endpoint; or run Cursor + OpenRouter for the main loop and Claude for review. The key is do not lock the main loop to Opus.

Q: Kimi or Hy3 for multilingual work?
A: Long CJK/multilingual documents and knowledge-base Q&A → Kimi. Multi-step coding Agents with dense tool calls → Hy3. Prices are close—trial both for a week and pick your default by task type.

Q: Is 16GB Mac mini enough?
A: Claude Code / Cursor + API inference only: yes. Local Ollama 14B + IDE + Agent in parallel: upgrade to 24GB or offload heavy work to Cloud Mac.

Conclusion: default model = traffic model, not strongest model

OpenRouter Top 10 is unambiguous: DeepSeek V4 Flash and Hy3 own the main loop, Claude holds the review layer, GPT-4o and xAI exit the default slot. When choosing, ask "can I retry this ten times without flinching?"—if yes, use Flash; if no, upgrade to Sonnet or Opus.

On hardware: giant MoE via API, small models via local Ollama, long-chain Agent execution on Cloud Mac. Pick the right model and your bill halves; pair the right Mac and your Agent stays stable.

ZavCloud

Hy3 + DeepSeek on API, Agent execution on Cloud Mac

Dedicated M4 24GB instances: run Claude Code, xcodebuild, and GitHub Runner while OpenRouter handles inference off-peak—the standard fix when local 16GB is not enough.

View Cloud Mac plans