Searchers asking "how fast is Ollama on Mac" or "can M4 run 7B" need reproducible numbers—not "it runs," but tok/s and how much swap slows generation. Same script on M4 16GB / 24GB with qwen3:8b and 14B tier; sizing framework and swap causal chain live in the main guide.
Vs the 16GB vs 24GB week-long diary: this post is only Ollama 7B/14B figures and reproduce commands—no buyer's remorse or Claude Code narrative.
Test setup and load
- Hardware: Mac mini M4, 16GB and 24GB units
- Software: macOS 15.x, latest stable Ollama
- Background: ~20 Chrome tabs, VS Code, Slack (daily dev, not bare metal)
- Models:
qwen3:8b,qwen3:14b(Ollama default quant)
7B (qwen3:8b) results
| Metric | 16GB | 24GB |
|---|---|---|
| Memory Used (steady) | ~13.2GB | ~16.4GB |
| Swap Used | 1.1GB | 0 |
| Memory Pressure | Yellow | Green |
| tok/s (512-token prompt, after 2 min) | ~34 | ~37 |
Similar model compute—the ~9% tok/s gap is mostly swap, not M4 lacking FLOPS.
14B and the memory wall
qwen3:14b steady state ~ 19.1GB used (24GB zero swap); 16GB swaps 2.3GB+, tok/s drops hard. 24GB for daily 14B—matches the main guide's "resident models × memory" axis.
Real swap impact on tok/s
Resident Ollama plus GitHub Runner xcodebuild peak (~+4–8GB) pushes 16GB into swap and slows CI. Fixes: scheduling runbook, 24GB, or Cloud Mac split.
Reproduce
# Pull and keep loaded
ollama pull qwen3:8b
ollama run qwen3:8b "" # keep loaded
# Other terminal: memory
memory_pressure
vm_stat | grep Pageouts
# tok/s (same script as 16GB vs 24GB post)
ollama run qwen3:8b "Write 512 tokens about Apple Silicon unified memory." \
--verbose 2>&1 | tee /tmp/ollama-bench.log
Log Memory Used, Swap, tok/s into your runbook as the team M4 Mac mini benchmark baseline.
Read next
ZavCloud
Reproduce the same Ollama script on Cloud Mac
Dedicated M4 Mac mini, billed daily—measure swap and tok/s before buying 16GB or 24GB hardware.
View Cloud Mac plans