How Much Infrastructure Does an AI Agent Need?

Bottom line first: don’t ask how many servers to buy—ask which layer your Agent’s execution boundary stops at. Solo developers often need only L0–L3; teams that must prove builds need Runner; unattended end-to-end delivery is when a Workflow platform pays off.

2026.06.18  ·  ~10 min  ·  Layered decisions · Spec tables · Rollout checklist

Data center server racks symbolizing layered execution and verification infrastructure for AI Agents

Over the past six months, helping dozens of teams evaluate “going Agent,” we heard two extremes most often: either they bought only a model API and expected it to edit production alone; or they deployed Kubernetes + a vector DB + three MCP servers + an autonomous Agent platform—and nobody maintained it three months later. What actually blocks delivery is rarely “the model isn’t smart enough,” but misaligned execution environment, verification chain, and context gateway. This article uses Cloud Mac AI Stack layering to turn “how much infrastructure does an AI Agent need?” into decision tables—you can match your team size instead of copying someone else’s architecture blog shopping list.

6
Infrastructure layers
3
Team tiers
16GB
Team Runner baseline RAM

Asymmetric takeaway

Model capability is not the dividing line—execution boundary is. The same Claude in a chat-only web UI gives advice; on a macOS node with terminal, git, and Runner it produces mergeable PRs. Infrastructure buys who may act in which environment, not raw FLOPS.

1. Why this problem exists: “can chat” ≠ “can ship”

After “Agent” became overloaded, many conflate chat interfaces with engineering Agents. Chat needs only a model API; engineering Agents must at least read the repo, edit files, run commands, and get objective verification signals. Missing any piece shows up as:

  • Agent edits code but nobody knows if tests ran—missing L1 Fact (Runner execution engine).
  • Agent only edits the open file; cross-module refactors are guesswork—missing L4 Context (MCP triple-connect).
  • Every tool works alone but a whole issue still needs 40 minutes of babysitting—missing L5 Workflow (OpenHands platform).
  • On a Windows laptop you want Xcode builds but the Agent has no legal execution surface—missing L0 real macOS (Cloud Mac vs local Mac).

The old reflex is “buy a stronger model”; the new one is fill execution and verification layer by layer. This is what ZavCloud customers ask when renting Cloud Mac—not whether RAM runs Ollama, but what role this node plays in the stack.

2. How to classify Agent infrastructure: six layers, not six products

We use L0–L5 (consistent with the Stack series). Note: layers are responsibilities, not a mandatory shopping list. Solo devs can stop at L3; L2 inference (Ollama) is optional throughout.

Layer Role Typical components Output Without it
L0 Execution environment Local Mac / Cloud Mac Session with terminal, git, Xcode Agent can only “talk,” not “do”
L1 Objective verification GitHub Runner Fact (test/build signals) Org won’t merge Agent PRs
L2 Optional inference Ollama / MLX Local inference No impact (API models substitute)
L3 Pair programming Claude Code / Cursor Agent Diff No structured code-change entry
L4 Context gateway MCP (GitHub / CodeGraph / API) Context Agent blind in large repos
L5 Autonomous workflow OpenHands etc. Workflow Multi-step work still manually chained

The conflict is clear: chat Agents stop before L3; engineering Agents need at least L0+L3; mergeable Agents need L1; scalable Agents discuss L4+L5. Many teams fail by skipping layers—e.g. OpenHands before Runner, so autonomous tasks change code with nobody proving build green.

3. Core comparison: solo / small team / engineering tiers

Unified columns (same as tool comparison articles): entry, execution, context, monthly cost band, best fit.

Tier Entry Execution Context Monthly cost band Best for
Solo · minimal stack CLI (Claude Code) Local file edits + manual tests Current repo + manual @ files API $20–100 Indie devs, side projects
Small team · mergeable stack CLI + PR flow L0 Mac + L1 Runner + L3 Agent GitHub issues (optional L4) API + Cloud Mac pay-per-day $50–300 3–15 engineer teams
Engineering · autonomous stack CLI + L5 task queue Multi-step execution + CI loop Full L4 MCP + CodeGraph Above + ~0.5 FTE maintenance Teams with platform engineers

Hardware: when L0 and L1 share one machine (common), use this table—RAM hits the ceiling before CPU model because Agent, Runner, and optional Ollama contend for unified memory:

Co-located workload Suggested RAM Notes
Runner + Claude Code only M4 16GB Fine for light iOS / Node repos
Runner + Claude Code + Ollama 7B M4 24GB See 16GB vs 24GB benchmarks
Runner + OpenHands + MCP M4 24GB–48GB L5 sandbox + Docker extra RAM
Multiple parallel Runners (large team) Split across nodes See one job one workspace

4. Scenario matrix

Quick triage with “if you are X, choose Y”:

If you are… Minimum viable stack Not needed yet
Solo side project, you merge yourself L0 local Mac + L3 Claude Code Runner, MCP, L5
Windows user doing iOS / macOS L0 Cloud Mac + L3 On-prem Mac rack
Team code review requires green CI L0 + L1 Runner + L3 L5 (don’t skip ahead)
100k+ line monorepo Above + L4 CodeGraph MCP Context window alone
5+ similar issues per day Full stack through L5 OpenHands Manual Claude session chaining
Strict compliance / data residency Dedicated L0 + optional L2 local inference Prod secrets in MCP

5. Recommended stacks: three copy-paste recipes

Stack A · Fastest solo launch (within 1 day)

L0  Local MacBook or pay-per-day Cloud Mac
L3  Claude Code (install handbook)
Model  Anthropic API subscription

Skip: Runner, MCP, vector DB, K8s

Stack B · Small team mergeable (1–2 weeks)

L0  Cloud Mac M4 16GB always-on node
L1  GitHub Actions self-hosted Runner (worth it?)
L3  Claude Code + team CLAUDE.md
L4  GitHub MCP read-only (issue-driven)

Optional L2: Ollama 7B for private drafts, off critical path

Stack C · Engineering autonomous delivery (1 month+)

L0  Cloud Mac M4 24GB+
L1  Runner · one job one workspace
L3  Claude Code
L4  MCP triple-connect + CodeGraph
L5  OpenHands (sandbox repo first)
Orchestration  OpenClaw triggers + audit (optional)

Red line: prod API / Runner creds never in MCP (permissions guide)

6. Common mistakes: five don’ts

  1. Treating model API as full infrastructure. API solves “think,” not “do” and “verify.”
  2. Opening L5 repo writes without Runner. Autonomous Agent without Fact layer is blind writing—rollback cost is extreme.
  3. Building vector DB + RAG platform on day one. Most code Agent bottlenecks are symbolic context (CodeGraph), not embedding search.
  4. VM on Windows posing as macOS CI. Signing, notarization, and device tests still need real Apple Silicon.
  5. Buying someone else’s full shopping list. Write execution boundary first, add layers incrementally; stack depth ≠ team headcount.

7. Rollout: 7-step checklist

  1. Define execution boundary — List allowed Agent actions: which dirs, shell, prod triggers.
  2. Confirm L0 — Xcode / notarization needs macOS; evaluate rent vs buy Mac.
  3. Add L3 coding Agent — Single file, single repo first; write CLAUDE.md / team prompt norms.
  4. Stand up L1 Runner — Separate macOS and Linux jobs; split secrets from Agent tokens.
  5. Add L4 MCP as needed — Read-only default; write via short-lived token on separate service.
  6. Evaluate L5 — Two weeks still manually chaining tools → add OpenHands-class Workflow.
  7. Audit and red lines — Map every autonomous task to PR + CI run ID; quarterly permission matrix review.

One-week acceptance test

Pick a real issue: from Agent change to green CI without anyone re-running tests manually—that means L0+L1+L3 is enough; if not, don’t add L5 yet.

FAQ

What is the minimum for a solo AI Agent developer?

macOS with terminal (local or Cloud Mac) + coding Agent (e.g. Claude Code) + model API. No self-hosted Runner, MCP, or Workflow platform.

Why GitHub Runner if I have Claude Code?

Claude Code produces Diff; Runner produces Fact. Without objective build signals, the team cannot judge mergeability—trust, not model IQ.

Does MCP count as infrastructure?

Yes, L4 context layer. It exposes issues and code graphs; without L0–L3 execution and verification, MCP alone cannot ship.

When do I need OpenHands?

Unattended whole-requirement delivery (multi-file, multi-round tests, auto PR) with stable L1+L4. Daily manual Claude sessions mean you need Workflow layer.

What does infrastructure cost?

Solo: API $20–200/mo. Small team: add pay-per-day Cloud Mac and Runner node. L5 stack: M4 24GB co-located, budget ~0.5 person for MCP and permissions.

Conclusion

How much infrastructure an AI Agent needs depends on where the execution boundary stops—not the model leaderboard. Solo: L3 is enough to start; orgs that must merge add L1; large repos add L4; unattended delivery adds L5. When buying Cloud Mac or Mac mini, ask whether the machine is “execution surface,” “verification surface,” or “inference surface”—that beats staring at TOPS numbers.

ZavCloud Cloud Mac

Give your Agent real macOS that can act and verify CI

Dedicated datacenter Mac mini M4: Runner, Claude Code, and MCP on one node—pay per day to trial the stack before scaling.

View Cloud Mac pricing
Cloud Mac Trial Agent execution node