Over the past six months, helping dozens of teams evaluate “going Agent,” we heard two extremes most often: either they bought only a model API and expected it to edit production alone; or they deployed Kubernetes + a vector DB + three MCP servers + an autonomous Agent platform—and nobody maintained it three months later. What actually blocks delivery is rarely “the model isn’t smart enough,” but misaligned execution environment, verification chain, and context gateway. This article uses Cloud Mac AI Stack layering to turn “how much infrastructure does an AI Agent need?” into decision tables—you can match your team size instead of copying someone else’s architecture blog shopping list.
Asymmetric takeaway
Model capability is not the dividing line—execution boundary is. The same Claude in a chat-only web UI gives advice; on a macOS node with terminal, git, and Runner it produces mergeable PRs. Infrastructure buys who may act in which environment, not raw FLOPS.
1. Why this problem exists: “can chat” ≠ “can ship”
After “Agent” became overloaded, many conflate chat interfaces with engineering Agents. Chat needs only a model API; engineering Agents must at least read the repo, edit files, run commands, and get objective verification signals. Missing any piece shows up as:
- Agent edits code but nobody knows if tests ran—missing L1 Fact (Runner execution engine).
- Agent only edits the open file; cross-module refactors are guesswork—missing L4 Context (MCP triple-connect).
- Every tool works alone but a whole issue still needs 40 minutes of babysitting—missing L5 Workflow (OpenHands platform).
- On a Windows laptop you want Xcode builds but the Agent has no legal execution surface—missing L0 real macOS (Cloud Mac vs local Mac).
The old reflex is “buy a stronger model”; the new one is fill execution and verification layer by layer. This is what ZavCloud customers ask when renting Cloud Mac—not whether RAM runs Ollama, but what role this node plays in the stack.
2. How to classify Agent infrastructure: six layers, not six products
We use L0–L5 (consistent with the Stack series). Note: layers are responsibilities, not a mandatory shopping list. Solo devs can stop at L3; L2 inference (Ollama) is optional throughout.
| Layer | Role | Typical components | Output | Without it |
|---|---|---|---|---|
| L0 | Execution environment | Local Mac / Cloud Mac | Session with terminal, git, Xcode | Agent can only “talk,” not “do” |
| L1 | Objective verification | GitHub Runner | Fact (test/build signals) | Org won’t merge Agent PRs |
| L2 | Optional inference | Ollama / MLX | Local inference | No impact (API models substitute) |
| L3 | Pair programming | Claude Code / Cursor Agent | Diff | No structured code-change entry |
| L4 | Context gateway | MCP (GitHub / CodeGraph / API) | Context | Agent blind in large repos |
| L5 | Autonomous workflow | OpenHands etc. | Workflow | Multi-step work still manually chained |
The conflict is clear: chat Agents stop before L3; engineering Agents need at least L0+L3; mergeable Agents need L1; scalable Agents discuss L4+L5. Many teams fail by skipping layers—e.g. OpenHands before Runner, so autonomous tasks change code with nobody proving build green.
3. Core comparison: solo / small team / engineering tiers
Unified columns (same as tool comparison articles): entry, execution, context, monthly cost band, best fit.
| Tier | Entry | Execution | Context | Monthly cost band | Best for |
|---|---|---|---|---|---|
| Solo · minimal stack | CLI (Claude Code) | Local file edits + manual tests | Current repo + manual @ files | API $20–100 | Indie devs, side projects |
| Small team · mergeable stack | CLI + PR flow | L0 Mac + L1 Runner + L3 Agent | GitHub issues (optional L4) | API + Cloud Mac pay-per-day $50–300 | 3–15 engineer teams |
| Engineering · autonomous stack | CLI + L5 task queue | Multi-step execution + CI loop | Full L4 MCP + CodeGraph | Above + ~0.5 FTE maintenance | Teams with platform engineers |
Hardware: when L0 and L1 share one machine (common), use this table—RAM hits the ceiling before CPU model because Agent, Runner, and optional Ollama contend for unified memory:
| Co-located workload | Suggested RAM | Notes |
|---|---|---|
| Runner + Claude Code only | M4 16GB | Fine for light iOS / Node repos |
| Runner + Claude Code + Ollama 7B | M4 24GB | See 16GB vs 24GB benchmarks |
| Runner + OpenHands + MCP | M4 24GB–48GB | L5 sandbox + Docker extra RAM |
| Multiple parallel Runners (large team) | Split across nodes | See one job one workspace |
4. Scenario matrix
Quick triage with “if you are X, choose Y”:
| If you are… | Minimum viable stack | Not needed yet |
|---|---|---|
| Solo side project, you merge yourself | L0 local Mac + L3 Claude Code | Runner, MCP, L5 |
| Windows user doing iOS / macOS | L0 Cloud Mac + L3 | On-prem Mac rack |
| Team code review requires green CI | L0 + L1 Runner + L3 | L5 (don’t skip ahead) |
| 100k+ line monorepo | Above + L4 CodeGraph MCP | Context window alone |
| 5+ similar issues per day | Full stack through L5 OpenHands | Manual Claude session chaining |
| Strict compliance / data residency | Dedicated L0 + optional L2 local inference | Prod secrets in MCP |
5. Recommended stacks: three copy-paste recipes
Stack A · Fastest solo launch (within 1 day)
L0 Local MacBook or pay-per-day Cloud Mac L3 Claude Code (install handbook) Model Anthropic API subscription Skip: Runner, MCP, vector DB, K8s
Stack B · Small team mergeable (1–2 weeks)
L0 Cloud Mac M4 16GB always-on node L1 GitHub Actions self-hosted Runner (worth it?) L3 Claude Code + team CLAUDE.md L4 GitHub MCP read-only (issue-driven) Optional L2: Ollama 7B for private drafts, off critical path
Stack C · Engineering autonomous delivery (1 month+)
L0 Cloud Mac M4 24GB+ L1 Runner · one job one workspace L3 Claude Code L4 MCP triple-connect + CodeGraph L5 OpenHands (sandbox repo first) Orchestration OpenClaw triggers + audit (optional) Red line: prod API / Runner creds never in MCP (permissions guide)
6. Common mistakes: five don’ts
- Treating model API as full infrastructure. API solves “think,” not “do” and “verify.”
- Opening L5 repo writes without Runner. Autonomous Agent without Fact layer is blind writing—rollback cost is extreme.
- Building vector DB + RAG platform on day one. Most code Agent bottlenecks are symbolic context (CodeGraph), not embedding search.
- VM on Windows posing as macOS CI. Signing, notarization, and device tests still need real Apple Silicon.
- Buying someone else’s full shopping list. Write execution boundary first, add layers incrementally; stack depth ≠ team headcount.
7. Rollout: 7-step checklist
- Define execution boundary — List allowed Agent actions: which dirs, shell, prod triggers.
- Confirm L0 — Xcode / notarization needs macOS; evaluate rent vs buy Mac.
- Add L3 coding Agent — Single file, single repo first; write CLAUDE.md / team prompt norms.
- Stand up L1 Runner — Separate macOS and Linux jobs; split secrets from Agent tokens.
- Add L4 MCP as needed — Read-only default; write via short-lived token on separate service.
- Evaluate L5 — Two weeks still manually chaining tools → add OpenHands-class Workflow.
- Audit and red lines — Map every autonomous task to PR + CI run ID; quarterly permission matrix review.
One-week acceptance test
Pick a real issue: from Agent change to green CI without anyone re-running tests manually—that means L0+L1+L3 is enough; if not, don’t add L5 yet.
FAQ
What is the minimum for a solo AI Agent developer?
macOS with terminal (local or Cloud Mac) + coding Agent (e.g. Claude Code) + model API. No self-hosted Runner, MCP, or Workflow platform.
Why GitHub Runner if I have Claude Code?
Claude Code produces Diff; Runner produces Fact. Without objective build signals, the team cannot judge mergeability—trust, not model IQ.
Does MCP count as infrastructure?
Yes, L4 context layer. It exposes issues and code graphs; without L0–L3 execution and verification, MCP alone cannot ship.
When do I need OpenHands?
Unattended whole-requirement delivery (multi-file, multi-round tests, auto PR) with stable L1+L4. Daily manual Claude sessions mean you need Workflow layer.
What does infrastructure cost?
Solo: API $20–200/mo. Small team: add pay-per-day Cloud Mac and Runner node. L5 stack: M4 24GB co-located, budget ~0.5 person for MCP and permissions.
Conclusion
How much infrastructure an AI Agent needs depends on where the execution boundary stops—not the model leaderboard. Solo: L3 is enough to start; orgs that must merge add L1; large repos add L4; unattended delivery adds L5. When buying Cloud Mac or Mac mini, ask whether the machine is “execution surface,” “verification surface,” or “inference surface”—that beats staring at TOPS numbers.
ZavCloud Cloud Mac
Give your Agent real macOS that can act and verify CI
Dedicated datacenter Mac mini M4: Runner, Claude Code, and MCP on one node—pay per day to trial the stack before scaling.
View Cloud Mac pricing