Bottom line: 100M tokens costs roughly $10–14 on Flash tier, ~$540 on Sonnet, ~$2,700 on Opus—all in USD. Four tables below list June 2026 list prices. One-line takeaway under each table; audience picks at the end.
If you are sizing an Agent budget or comparing Cursor defaults to a self-hosted API stack, start here—not with benchmark scores. Every figure below is US dollars per million tokens ($/M) unless noted. Model names and tiers match what developers actually route on OpenRouter in mid-June 2026; your invoice may differ slightly with caching, routing, or enterprise discounts.
Table 1: Flash execution tier — June 2026 API rates
OpenRouter and official pricing pages. Unit: USD per million tokens ($/M). List prices below; your dashboard shows actuals after cache and routing.
Default for Agent main loops — long context and retries without budget panic
| Model | Input /M | Output /M | Cache read /M | Context |
|---|---|---|---|---|
| DeepSeek V4 Flash #1 usage | $0.098 | $0.197 | ~$0.01 | 1M |
| Hy3 Preview | ~$0.10 | ~$0.20 | Yes | 256K+ |
| MiMo-V2-Flash | $0.10 | $0.30 | $0.01 | 256K |
| Gemini 2.5 Flash | $0.15 | $0.60 | Yes | 1M |
| Kimi K2 | ~$0.15 | ~$0.50 | Yes | 128K |
| GPT-4o mini | $0.15 | $0.60 | Yes | 128K |
| Owl Alpha | ~$0.12 | ~$0.35 | — | 200K |
Table 1: This tier absorbs ~80% of Agent tokens. The OpenRouter weekly Top 10 is almost entirely Flash; DeepSeek + Hy3 together exceed 20T/week. Pick your default model string here first. When you see “cache read” at ~$0.01/M, repeated system prompts and RAG chunks get cheap fast—that is why teams dare to run 200K-token repo reads on Flash instead of Sonnet.
Flash models are not “worse GPT”—they are MoE architectures tuned for high-volume loops. DeepSeek V4 Flash leads on raw usage because it pairs 1M context with sub-$0.10/M input. Hy3 and Kimi matter when your pipeline is CJK-heavy or tool-call dense. Gemini 2.5 Flash and GPT-4o mini are the Western-vendor equivalents at roughly 1.5–2× DeepSeek’s list price.
Table 2: Frontier review tier — June 2026 API rates
Escalate only — pre-merge review and architecture calls, not the default loop
| Model | Input /M | Output /M | Context | OpenRouter trend |
|---|---|---|---|---|
| Claude Sonnet 4.6 | ~$3.00 | ~$15.00 | 200K | Review workhorse |
| Claude Opus 4.7 | ~$15.00 | ~$75.00 | 200K | Sign-off tier |
| GPT-4o | $2.50 | $10.00 | 128K | Dropped from Top 8 |
| Gemini 2.5 Pro | ~$1.25 | ~$10.00 | 1M | Multimodal long-form |
| o3 / o4-mini (reasoning) | $1.10–4.00 | $4.40–16.00 | 200K | Math / proof tasks |
Table 2: Top quality, but too expensive for the Agent main loop. Claude Opus still clears 7T+ weekly tokens—in a review role, not as default. GPT-4o is being swapped out of primary flows for Flash. Sonnet 4.6 is the sensible “step up” when a diff needs careful judgment; Opus is for sign-off where a mistake costs more than the API bill.
Frontier tier pricing explains why “just use the best model” stopped being viable once Agents began burning 50K–200K tokens per task. A single Sonnet review is affordable; making Sonnet the default for every file read is not. Gemini 2.5 Pro and o-series reasoning models fill niche lanes—long multimodal docs or formal proofs—not daily coding loops.
Table 3: 100M-token bill comparison
Common yardstick: 100M tokens. Three mixes: input-only / 80·20 chat / 90·10 Agent.
Same volume, up to 200× spread cheapest vs priciest
| Model | Input-only 100M | 80/20 mix | 90/10 Agent | vs DeepSeek |
|---|---|---|---|---|
| Flash execution tier | ||||
| DeepSeek V4 Flash | ~$10 | ~$12 | ~$11 | 1× |
| Hy3 Preview | ~$10 | ~$13 | ~$11 | 1.1× |
| Gemini 2.5 Flash | ~$15 | ~$24 | ~$19 | 2× |
| Frontier review tier | ||||
| GPT-4o | ~$250 | ~$400 | ~$325 | 33× |
| Claude Sonnet 4.6 | ~$300 | ~$540 | ~$420 | 45× |
| Claude Opus 4.7 | ~$1,500 | ~$2,700 | ~$2,100 | 225× |
Table 3: 1B tokens/month → DeepSeek ~$120, Sonnet ~$5,400. Agent workloads skew input-heavy—weight the 90/10 column. High cache hit rates can shave 50%+ off Flash-tier actuals. Use this table when finance asks “what if we 10× traffic?”—the multiplier hurts far more on Frontier rows than Flash rows.
“100M tokens” is a useful mental unit: roughly a busy week for a small Agent pilot, or a few hours for a high-volume RAG service. Pure-input column models ingestion-heavy pipelines (search, rerank, classify). The 80/20 mix matches chat products. The 90/10 Agent column is the one to stress-test if your tool reads entire repositories before writing a short patch.
Table 4: Typical single Agent task cost
Assumption: 100K input + 10K output, 80% input cache hit. Daily bill at 500 runs.
500/day: DeepSeek $4 vs Sonnet $105
| Model | Input /M | Per task | 500/day | vs DeepSeek |
|---|---|---|---|---|
| Flash execution tier | ||||
| DeepSeek V4 Flash | ~$0.10 | $0.008 | ~$4 | 1× |
| Hy3 Preview | ~$0.10 | $0.009 | ~$5 | 1.1× |
| Gemini 2.5 Flash | ~$0.15 | $0.02 | ~$10 | 2.5× |
| Kimi K2 | ~$0.15 | $0.018 | ~$9 | 2.3× |
| Frontier review tier | ||||
| Claude Sonnet 4.6 | ~$3.00 | $0.21 | ~$105 | 26× |
| Claude Opus 4.7 | ~$15.00 | $1.05 | ~$525 | 131× |
| GPT-4o | ~$2.50 | $0.18 | ~$90 | 23× |
Table 4: Realistic burn for Claude Code / OpenHands-style tools. Quality gap is far smaller than 26×—Sonnet should not be the main-loop default. At 500 runs per day, Sonnet is a $3,000+/month line item on this single workload shape alone; Flash stays in double digits.
The per-task row is what engineering leads should paste into a spreadsheet: multiply by expected daily Agent invocations, then add headroom for retries. If your product triggers an LLM on every CI failure, every support ticket, and every nightly batch job, Table 4 scales linearly—there is no “unlimited tier” on raw API pricing.
Audience picks: who you are → which row to use
Primary model + escalation model + monthly budget band
| Audience | Primary (80% tokens) | Escalation (5–10%) | Monthly API budget |
|---|---|---|---|
| Solo dev · IDE completion | Cursor / Copilot subscription | — | $20–40 sub |
| Indie full-stack · light Agent | DeepSeek V4 Flash | Claude Sonnet (review) | $20–80 |
| CJK business · long Agent chains | Hy3 Preview | Kimi K2 / Sonnet | $50–200 |
| Small-team RAG product | DeepSeek Flash + cache | Sonnet pre-merge review | $200–800 |
| 500+ Agent tasks/day | DeepSeek / Hy3 dual route | Opus on critical nodes only | $120–600 (Flash-heavy) |
| Source-sensitive · data residency | Mac mini Ollama 7B–14B | Flash API for non-sensitive only | Hardware > API |
| Finance / healthcare · costly failures | Flash drafts + retrieval | Opus / GPT-4o + human gate | Compliance-driven |
Rule of thumb: Flash carries volume; Frontier guards gates. Default stack = DeepSeek / Hy3 + Claude Sonnet. Usage trends: OpenRouter pricing reality.
Quick notes per audience
- Solo dev / IDE: Subscription tools bundle token cost—you optimize time, not $/M. Revisit API pricing only when you outgrow included fast requests.
- Indie Agent: DeepSeek default + Sonnet on merge is the lowest-friction split; budget $20–80 until usage dashboards prove otherwise.
- CJK long chains: Hy3’s tool stability often beats raw $/M; pair with Kimi when documents are mainland-Chinese heavy.
- RAG product: Cache your system prompt and doc prefixes; Flash + Sonnet review beats single-model Sonnet by an order of magnitude.
- 500+ tasks/day: Dual-route DeepSeek/Hy3 before considering Opus; Opus belongs on human-gated steps only.
- Data residency: Local 7B–14B removes per-token billing for predictable workloads; API for bursts and 200B+ MoE capability.
- Regulated: Price is secondary to audit trails—still route bulk token volume through Flash, not Opus.
In one line: price picks Flash; risk picks Sonnet/Opus. 100M tokens is the yardstick; the audience table is the answer.
Revisit this page when vendors cut Flash prices again—June 2026 already moved faster than 2025 frontier lists. Export your own usage split monthly; tables age well, but your input/output ratio is the variable that actually moves your bill. When in doubt, default to Flash, measure real usage, then promote models to production.
ZavCloud
Know what local inference can cover before you size API spend
Run Ollama on Cloud Mac—find the daily token ceiling for 7B/14B, then set your Flash API budget.
View Cloud Mac plans