How much do 100M tokens cost on the cheapest models?

As of June 2026, DeepSeek V4 Flash and similar Flash-tier models run about $12 for a typical 80/20 input/output mix on 100M tokens; input-only is about $10 USD.

What should be the default model tier in 2026?

Run Agent main loops on Flash (DeepSeek V4 Flash / Hy3). Escalate to Claude Sonnet for pre-merge review. Reserve Opus for nodes where a single failure is extremely costly.

What Are Tokens? How Much Do 100M Tokens Cost? Complete 2026 AI Model Pricing Guide

Bottom line: 100M tokens costs roughly $10–14 on Flash tier, ~$540 on Sonnet, ~$2,700 on Opus—all in USD. Four tables below list June 2026 list prices. One-line takeaway under each table; audience picks at the end.

If you are sizing an Agent budget or comparing Cursor defaults to a self-hosted API stack, start here—not with benchmark scores. Every figure below is US dollars per million tokens ($/M) unless noted. Model names and tiers match what developers actually route on OpenRouter in mid-June 2026; your invoice may differ slightly with caching, routing, or enterprise discounts.

$0.10

Flash input avg /M

~$12

100M tokens floor

26×

Sonnet vs DeepSeek

Table 1: Flash execution tier — June 2026 API rates

OpenRouter and official pricing pages. Unit: USD per million tokens ($/M). List prices below; your dashboard shows actuals after cache and routing.

Flash tier

Default for Agent main loops — long context and retries without budget panic

Model	Input /M	Output /M	Cache read /M	Context
DeepSeek V4 Flash #1 usage	$0.098	$0.197	~$0.01	1M
Hy3 Preview	~$0.10	~$0.20	Yes	256K+
MiMo-V2-Flash	$0.10	$0.30	$0.01	256K
Gemini 2.5 Flash	$0.15	$0.60	Yes	1M
Kimi K2	~$0.15	~$0.50	Yes	128K
GPT-4o mini	$0.15	$0.60	Yes	128K
Owl Alpha	~$0.12	~$0.35	—	200K

Table 1: This tier absorbs ~80% of Agent tokens. The OpenRouter weekly Top 10 is almost entirely Flash; DeepSeek + Hy3 together exceed 20T/week. Pick your default model string here first. When you see “cache read” at ~$0.01/M, repeated system prompts and RAG chunks get cheap fast—that is why teams dare to run 200K-token repo reads on Flash instead of Sonnet.

Flash models are not “worse GPT”—they are MoE architectures tuned for high-volume loops. DeepSeek V4 Flash leads on raw usage because it pairs 1M context with sub-$0.10/M input. Hy3 and Kimi matter when your pipeline is CJK-heavy or tool-call dense. Gemini 2.5 Flash and GPT-4o mini are the Western-vendor equivalents at roughly 1.5–2× DeepSeek’s list price.

Table 2: Frontier review tier — June 2026 API rates

Frontier tier

Escalate only — pre-merge review and architecture calls, not the default loop

Model	Input /M	Output /M	Context	OpenRouter trend
Claude Sonnet 4.6	~$3.00	~$15.00	200K	Review workhorse
Claude Opus 4.7	~$15.00	~$75.00	200K	Sign-off tier
GPT-4o	$2.50	$10.00	128K	Dropped from Top 8
Gemini 2.5 Pro	~$1.25	~$10.00	1M	Multimodal long-form
o3 / o4-mini (reasoning)	$1.10–4.00	$4.40–16.00	200K	Math / proof tasks

Table 2: Top quality, but too expensive for the Agent main loop. Claude Opus still clears 7T+ weekly tokens—in a review role, not as default. GPT-4o is being swapped out of primary flows for Flash. Sonnet 4.6 is the sensible “step up” when a diff needs careful judgment; Opus is for sign-off where a mistake costs more than the API bill.

Frontier tier pricing explains why “just use the best model” stopped being viable once Agents began burning 50K–200K tokens per task. A single Sonnet review is affordable; making Sonnet the default for every file read is not. Gemini 2.5 Pro and o-series reasoning models fill niche lanes—long multimodal docs or formal proofs—not daily coding loops.

Table 3: 100M-token bill comparison

Common yardstick: 100M tokens. Three mixes: input-only / 80·20 chat / 90·10 Agent.

100M tokens

Same volume, up to 200× spread cheapest vs priciest

Model	Input-only 100M	80/20 mix	90/10 Agent	vs DeepSeek
Flash execution tier
DeepSeek V4 Flash	~$10	~$12	~$11	1×
Hy3 Preview	~$10	~$13	~$11	1.1×
Gemini 2.5 Flash	~$15	~$24	~$19	2×
Frontier review tier
GPT-4o	~$250	~$400	~$325	33×
Claude Sonnet 4.6	~$300	~$540	~$420	45×
Claude Opus 4.7	~$1,500	~$2,700	~$2,100	225×

Table 3: 1B tokens/month → DeepSeek ~$120, Sonnet ~$5,400. Agent workloads skew input-heavy—weight the 90/10 column. High cache hit rates can shave 50%+ off Flash-tier actuals. Use this table when finance asks “what if we 10× traffic?”—the multiplier hurts far more on Frontier rows than Flash rows.

“100M tokens” is a useful mental unit: roughly a busy week for a small Agent pilot, or a few hours for a high-volume RAG service. Pure-input column models ingestion-heavy pipelines (search, rerank, classify). The 80/20 mix matches chat products. The 90/10 Agent column is the one to stress-test if your tool reads entire repositories before writing a short patch.

Table 4: Typical single Agent task cost

Assumption: 100K input + 10K output, 80% input cache hit. Daily bill at 500 runs.

One Agent run

500/day: DeepSeek $4 vs Sonnet $105

Model	Input /M	Per task	500/day	vs DeepSeek
Flash execution tier
DeepSeek V4 Flash	~$0.10	$0.008	~$4	1×
Hy3 Preview	~$0.10	$0.009	~$5	1.1×
Gemini 2.5 Flash	~$0.15	$0.02	~$10	2.5×
Kimi K2	~$0.15	$0.018	~$9	2.3×
Frontier review tier
Claude Sonnet 4.6	~$3.00	$0.21	~$105	26×
Claude Opus 4.7	~$15.00	$1.05	~$525	131×
GPT-4o	~$2.50	$0.18	~$90	23×

Table 4: Realistic burn for Claude Code / OpenHands-style tools. Quality gap is far smaller than 26×—Sonnet should not be the main-loop default. At 500 runs per day, Sonnet is a $3,000+/month line item on this single workload shape alone; Flash stays in double digits.

The per-task row is what engineering leads should paste into a spreadsheet: multiply by expected daily Agent invocations, then add headroom for retries. If your product triggers an LLM on every CI failure, every support ticket, and every nightly batch job, Table 4 scales linearly—there is no “unlimited tier” on raw API pricing.

Audience picks: who you are → which row to use

Routing picks

Primary model + escalation model + monthly budget band

Audience	Primary (80% tokens)	Escalation (5–10%)	Monthly API budget
Solo dev · IDE completion	Cursor / Copilot subscription	—	$20–40 sub
Indie full-stack · light Agent	DeepSeek V4 Flash	Claude Sonnet (review)	$20–80
CJK business · long Agent chains	Hy3 Preview	Kimi K2 / Sonnet	$50–200
Small-team RAG product	DeepSeek Flash + cache	Sonnet pre-merge review	$200–800
500+ Agent tasks/day	DeepSeek / Hy3 dual route	Opus on critical nodes only	$120–600 (Flash-heavy)
Source-sensitive · data residency	Mac mini Ollama 7B–14B	Flash API for non-sensitive only	Hardware > API
Finance / healthcare · costly failures	Flash drafts + retrieval	Opus / GPT-4o + human gate	Compliance-driven

Rule of thumb: Flash carries volume; Frontier guards gates. Default stack = DeepSeek / Hy3 + Claude Sonnet. Usage trends: OpenRouter pricing reality.

Quick notes per audience

Solo dev / IDE: Subscription tools bundle token cost—you optimize time, not $/M. Revisit API pricing only when you outgrow included fast requests.
Indie Agent: DeepSeek default + Sonnet on merge is the lowest-friction split; budget $20–80 until usage dashboards prove otherwise.
CJK long chains: Hy3’s tool stability often beats raw $/M; pair with Kimi when documents are mainland-Chinese heavy.
RAG product: Cache your system prompt and doc prefixes; Flash + Sonnet review beats single-model Sonnet by an order of magnitude.
500+ tasks/day: Dual-route DeepSeek/Hy3 before considering Opus; Opus belongs on human-gated steps only.
Data residency: Local 7B–14B removes per-token billing for predictable workloads; API for bursts and 200B+ MoE capability.
Regulated: Price is secondary to audit trails—still route bulk token volume through Flash, not Opus.

In one line: price picks Flash; risk picks Sonnet/Opus. 100M tokens is the yardstick; the audience table is the answer.

Revisit this page when vendors cut Flash prices again—June 2026 already moved faster than 2025 frontier lists. Export your own usage split monthly; tables age well, but your input/output ratio is the variable that actually moves your bill. When in doubt, default to Flash, measure real usage, then promote models to production.

ZavCloud

Know what local inference can cover before you size API spend

Run Ollama on Cloud Mac—find the daily token ceiling for 7B/14B, then set your Flash API budget.

View Cloud Mac plans