Over the past two weeks in this Stack series we stood up L1 Runner (Fact), L2 Ollama (Inference), and L4 MCP (Context) layer by layer. Reader feedback keeps repeating one line: “Every tool is connected, but I still manually string the workflow every day.” Claude Code can produce a diff, MCP can pull GitHub context, Runner can go green after push — yet “fix issue #142 and open a PR” still means someone staring at a terminal for forty minutes.
That is what L5 · OpenHands answers: not another CLI purchase, but upgrading Cloud Mac from a tool collection into an Agent platform that can autonomously finish multi-step engineering tasks. This page is L5-Q01 · R1 · series Hub: it moves readers from “coding tools” to “Agent platform” thinking — where Workflow sits in the Stack, why OpenHands vs Claude Code is not a replacement story, typical tasks, and OpenHands self-hosted on macOS architecture. No Docker install steps here (that is the L5-Q02 SEO landing page).
Cloud Mac AI Stack · series slogan (fourth ring)
Claude Code produces Diff; GitHub Runner produces Fact; OpenHands produces Workflow.
MCP supplies Context; Ollama supplies optional Inference. Workflow consumes Context / Diff / Fact and calls the latter two repeatedly in a loop — not a one-way pipeline. See Stack language.
The “tool collection” trap: every piece works; the chain still runs on humans
A typical week on site (we have seen this shape in customer repos many times):
- Monday: Claude Code edits the API layer, MCP pulls the GitHub issue list — smooth inside the session.
- Tuesday: a teammate pushes from another machine; Runner goes red — nobody aligned Agent edits with CI scripts (without reading the L1 execution engine, this repeats).
- Wednesday: manual test runs, config tweaks, another Claude Code session to patch files.
- Thursday: checks finally green, but docs, migration scripts, sample tests are still missing — because “edit code” and “deliver the requirement” were treated as the same job.
A tool collection means: each step has a best tool, but no layer owns the whole requirement. An Agent platform adds a Workflow layer that can decompose tasks, execute, and retry from failure on its own — OpenHands is the open-source option in the Stack for that layer (evolved from the OpenDevin ecosystem).
Stack language: Workflow with Context / Diff / Fact
Series-wide notation: do not draw Workflow as a one-way downstream of Fact. Workflow (L5) is the orchestration layer that repeatedly consumes Context, produces Diff, and validates with Fact until it decides “requirement done”:
Cloud Mac AI Stack · output relationships (not call order) Workflow (L5 · OpenHands) ├── Context (L4 · MCP) ← read repo / issue / API ├── Diff (L3 · Claude Code etc) ← edit code / write files └── Fact (L1 · Runner / tests) ← run test / build / CI signal Agent loop (inside Workflow · may iterate many times) Diff ↔ Fact ↑ ↓ Observe → re-Plan → re-Execute …
Four outputs to remember: Context · Diff · Fact · Workflow (MCP · coding layer · Runner · OpenHands). Inference (L2 · Ollama) is optional and omitted above to avoid confusion with the Agent loop.
| Layer | Component | Output | Question answered |
|---|---|---|---|
| L4 | MCP | Context | What can the Agent see? |
| L3 | Claude Code | Diff | What is this change? |
| L1 | GitHub Runner | Fact | Will the org trust it? |
| L5 | OpenHands | Workflow | Is the whole requirement done? |
Workflow is not “another CI job” but a multi-step, interruptible and resumable task state machine: it calls Diff and Fact many times in a loop until the PR is deliverable. Claude Code excels at single-round Diff; OpenHands excels at running the whole loop unattended — provided Context and Fact are already in place.
OpenHands in one minute (not an encyclopedia)
OpenHands is an open-source autonomous software engineering Agent platform: in a sandbox (often Docker) it accepts natural-language goals and automatically plans → writes code / runs commands → reads output → debugs, with GitHub integration (issues, PRs, CI status). In the Cloud Mac AI Stack it does not replace Claude Code’s paired experience or Runner’s objective build proof — it orchestrates multi-step delivery on top of both.
Different from “install another MCP Server”
MCP extends the context boundary (read repo, call APIs); OpenHands extends task depth (decide the next tool call, whether to retry). Without L4, OpenHands edits blind; without L1, OpenHands “done” cannot be accepted by the org.
OpenHands vs Claude Code: why they are not competitors
People searching OpenHands vs Claude Code or Claude Code alternative often ask: can one replace the other? In the Cloud Mac AI Stack the answer is no, and it should not — they sit on different layers with different outputs:
- Claude Code (L3) → produces Diff: paired coding, you are present.
- OpenHands (L5) → produces Workflow: autonomous agent, you set the goal.
Treating OpenHands as “another Claude Code” fails fast: OpenHands does not align fuzzy product intuition in your head; Claude Code does not unattended-run an eight-step issue. The right pattern is stacked use — Claude Code for hard problems by day, OpenHands clearing the issue queue at night.
| Dimension | Claude Code (L3 · Diff) | OpenHands (L5 · Workflow) |
|---|---|---|
| Interaction | Human present, step-by-step confirm | Goal-driven, multi-step autonomy |
| Typical duration | 5–30 minute session | 30 minutes to hours per task |
| Strength | Complex single-point refactor, align intent | Scripted requirements, batch small changes, templated features |
| Risk | Session ends → partial work | Runaway edits, too many files, excess permissions |
| Output | Diff | Workflow (PR, logs, step trajectory) |
| OpenHands alternative? | ❌ No | ❌ Not a Claude Code replacement |
Rule of thumb (not a contract): if you can state the change intent in one PR sentence → Claude Code; if you say “finish the issue” → consider OpenHands. In large repos with CodeGraph indexing, the paired layer often stays Claude Code; OpenHands fits templatized backend tasks. Both share MCP Context but not the same responsibility.
What tasks can OpenHands do? (the first search question)
Many people search OpenHands agent or OpenHands github to ask: what work is reliable to hand off? Below are task types we suggest trying first with tests + CI — also the example pool for the L5-Q02 tutorial:
| Task type | Typical input | Expected delivery | Fit |
|---|---|---|---|
| Fix bug | GitHub issue + repro steps | Patch + tests + PR | ⭐⭐⭐⭐ |
| Dependency upgrade | “Upgrade React 18→19” | Lockfile + breaking-change fixes | ⭐⭐⭐⭐ |
| Lint cleanup | ESLint / SwiftLint report | Batch fix warnings, no behavior change | ⭐⭐⭐⭐⭐ |
| Generate tests | Uncovered module list | Unit test PR | ⭐⭐⭐ |
| Documentation sync | API change diff | README / OpenAPI sync | ⭐⭐⭐⭐ |
| Scaffold / boilerplate | “Add REST endpoint” template | Routes + test skeleton | ⭐⭐⭐⭐ |
Poor first OpenHands tasks: untested large refactors, major UX redesign, schema migrations needing business approval, anything touching production secrets. Keep those in Claude Code paired sessions; let Runner produce Fact after human gate.
How OpenHands works
People searching How OpenHands works, OpenHands architecture, or OpenHands agent loop want to know: how does an autonomous agent turn one sentence into a mergeable PR? OpenHands centers on a four-step loop — often written Plan → Execute → Observe → Debug:
| Phase | What it does | Consumes |
|---|---|---|
| Plan | Read issue, split subtasks, file list | Context (MCP, GitHub, repo tree) |
| Execute | Write patches, run shell, call tools | Produces Diff |
| Observe | Read test output, lint, build logs | Consumes Fact (local test or Runner) |
| Debug | Revise plan or code from Observe | Back to Execute; loop until pass |
OpenHands agent loop (concept · not a single pipeline)
┌──────────┐
│ Plan │ ← Context
└────┬─────┘
▼
┌──────────┐
│ Execute │ → Diff
└────┬─────┘
▼
┌──────────┐
│ Observe │ ← Fact (test / build / CI)
└────┬─────┘
│
fail │ pass
▼
┌──────────┐ ┌─────────────┐
│ Debug │ ──────▶│ Workflow done│ → PR / delivery
└────┬─────┘ └─────────────┘
│
└──── back to Plan or Execute (next round)
This is not “install a stronger Chat.” OpenHands architecture hinges on a stateful task machine — each Observe result is written into trajectory for the next Plan. Fix bug, lint cleanup, and similar tasks are the same loop with different Plan entry sentences. Real task replay below walks one issue through all four steps; Docker and UI config land in L5-Q02.
Stack L0–L4 before L5 — or the Agent performs in a sandbox alone
We oppose “install OpenHands on day one” tool stacking. Recommended order matches the L1 rollout sequence, with L5 after MCP:
- L0 — Always-on Cloud Mac macOS node.
- L1 — Runner: repeatable
push → green/red. - L2–L3 — Optional Ollama + Claude Code coding on top of Fact.
- L4 — MCP Hub + permission model: auditable read/write for Agents.
- L5 — OpenHands: multi-step Workflow.
Without L1, OpenHands can technically run and open PRs, but the team cannot judge merge risk — the same org incident as “Claude Code SSH all green, Actions all red.” Without L4 permissions, autonomous Agent token exposure grows; see the MCP security spec.
Real task replay: one full Plan → Execute → Observe → Debug round
Below is a task shape we replay on an OpenHands sandbox fork (numbers illustrative). Match each step to the agent loop:
Goal: fix issue #218 "CSV export missing UTF-8 BOM" ① Read issue + related src/export/*.ts ~2 min · Context (MCP/Git) ② Generate 6-step plan ~1 min · Plan ③ Edit 4 files + add 1 test ~8 min · Execute ④ Run pnpm test → fail (snapshot mismatch) ~3 min · Observe · Fact ⑤ Read logs → edit 2 more files ~5 min · Debug → re-Execute ⑥ Re-run tests → green ~3 min ⑦ Open PR, link issue ~1 min · Workflow delivery ~23 min wall time · human: approve goal + final merge review only
Note step ④ (Observe): test failure is not disaster — it is agent loop input. In paired tools you fix on the spot; OpenHands feeds Fact back through Observe → Debug into the next Execute. Without a stable test command (L1 not solid), Observe has no signal and the loop spins — another reason Runner comes before OpenHands.
Trigger-side concept (not an install tutorial — only how it connects to GitHub):
# Concept: issue label triggers autonomous task (pseudocode) on: issues: types: [labeled] if: github.event.label.name == 'agent:openhands' run: | openhands run \ --repo "${{ github.repository }}" \ --issue "${{ github.event.issue.number }}" \ --max-iterations 40 \ --sandbox docker
Runner · OpenClaw · OpenHands: three names, three roles
The triangle we get asked about most on L5 articles:
| Component | Stack layer | Metaphor | Typical action |
|---|---|---|---|
| GitHub Runner | L1 · Fact | Legs | xcodebuild, pnpm test, archive |
| OpenClaw | Orchestration (not main Stack tier) | Dispatch desk | Trigger order, receipts, audit, ACK |
| OpenHands | L5 · Workflow | Autonomous engineer | Read requirement, edit code, iterate to PR |
OpenClaw does not make architecture decisions — it answers “when to run, how to notify when done.” OpenHands does not sign iOS packages for Runner — it produces reviewable PRs and step logs. All three can stack on one Cloud Mac, but do not merge their duties into one runbook.
Typical OpenHands architecture on Cloud Mac (self-hosted · Docker · macOS)
Engineers searching OpenHands Mac, OpenHands macOS, OpenHands self-hosted, or OpenHands Docker want a deployable topology — not install steps, but “where components live.” Our recommended minimum production shape on Apple Silicon Cloud Mac:
OpenHands self-hosted on macOS (Cloud Mac · L0 base)
GitHub (issues / webhooks / PR)
│
▼
┌─────────────────────────────────────┐
│ Cloud Mac · macOS · Apple Silicon │
│ ┌─────────────┐ ┌───────────────┐ │
│ │ OpenHands │ │ Claude Code │ │ L5 Workflow + L3 Diff (same host OK)
│ │ (Docker) │ │ (SSH/terminal)│ │
│ └──────┬──────┘ └───────────────┘ │
│ │ sandbox workspace │
│ ▼ │
│ ┌─────────────┐ ┌───────────────┐ │
│ │ MCP Servers │ │ Ollama (opt) │ │ L4 Context · L2 Inference
│ └─────────────┘ └───────────────┘ │
│ │ git push │
│ ▼ │
│ ┌─────────────────────────────────┐ │
│ │ GitHub Runner (self-hosted) │ │ L1 Fact
│ └─────────────────────────────────┘ │
└─────────────────────────────────────┘
Why Workflow on Cloud Mac, not a laptop?
- Duration — OpenHands tasks often run 30–90 minutes; lid-closed laptop breaks them.
- OpenHands Docker — sandbox needs a stable daemon; 24/7 Cloud Mac fits better.
- Same stack as Runner — Agent edits → Runner validates on the same macOS node, fewer “SSH green, Actions red” cases.
- ABI alignment — iOS / macOS target repos on Apple Silicon beat forcing Docker on a Linux VPS.
Minimal OpenHands Docker start shape (concept snippet; full docker compose and env in L5-Q02 tutorial):
# Cloud Mac · OpenHands self-hosted (illustrative)
docker pull docker.all-hands.dev/all-hands-ai/openhands:0.9
docker run -d --name openhands \
-e SANDBOX_USER_ID=$(id -u) \
-v /var/run/docker.sock:/var/run/docker.sock \
-v $HOME/.openhands:/.openhands \
-p 3000:3000 \
docker.all-hands.dev/all-hands-ai/openhands:0.9
Dedicated node ≠ automatic safety
Cloud Mac solves compute and macOS ABI; OpenHands still needs repo-level least privilege (bot branches, no prod secrets). Future L6 Agent Ops / Governance will cover audit, policy, and human gates — this page establishes Workflow; governance is the next ring in the series schedule.
L5 Agent Stack sizing: from Workflow to which Cloud Mac to rent
After architecture, the natural question: what machine for the Workflow layer? Sizing from real stacked workloads (not contract SLA — to shorten decisions):
| Scenario | Suggested config | Notes |
|---|---|---|
| OpenHands Only (light issues, no local inference) | M4 · 16GB | Docker sandbox + API LLM; good to trial agent workflows |
| OpenHands + Claude Code (paired + autonomous same host) | M4 · 24GB | Diff by day, Workflow by night; avoid CI memory fights |
| OpenHands + Ollama 7B | M4 · 24GB | Private Inference + Agent; see off-peak scheduling |
| OpenHands + Ollama 14B + Runner | M4 Pro · 48GB | 14B resident + sandbox + daily macOS CI; lowest Swap risk |
| iOS team (OpenHands issues + xcodebuild CI) | M4 · 24GB+ | Agent and Runner co-located; reserve 8GB+ for archive peaks |
This is the commercial narrative Workflow → Cloud Mac → sizing: confirm you need L5 capability first, then pick a node that holds Docker + (optional) Ollama + Runner — not rent hardware then stack tools backward.
Fit and misfit: boundaries matter more than Agent hype
| Better for OpenHands (L5) | Poor fit / use caution |
|---|---|
| Internal tools, scaffolding, docs sites, test backfill | Regulated finance / healthcare core paths without human gate |
| Repos with clear issue templates and decent test coverage | Repos with no tests, no CI — “ship first” |
| Repeatable migrations (dependency upgrades, lint batch fixes) | Major UX needing strong product intuition |
| Existing L1 Runner + L4 MCP permission policy | Secrets scattered in repo, no token rotation |
| Team accepts “Agent PR + human merge” | Agent must push main / auto-release prod |
Scripts and small services: yes; unguarded compliant prod writes: no. OpenHands is an engineering accelerator, not a liability-free “auto DevOps.”
Decision: should your Cloud Mac upgrade to an Agent platform?
Self-check below — hit ≥3 left-column rows before investing in L5; otherwise shore up L1/L4 first.
| Ready for OpenHands | Not yet |
|---|---|
| ≥5 “small but complete” issues queued per week | Main pain is “no macOS CI” |
| Runner green but lots of manual step stringing | Claude Code sessions still unstable |
| MCP permissions and bot accounts tiered | GitHub PAT with full repo admin |
| Willing to maintain sandbox and task logs | No one does merge review |
| Cloud Mac 24GB or Ollama off-peak scheduled | 16GB running 14B + Agent + Xcode together |
Decision (not a summary): OpenHands value is not “smarter Chat” but Cloud Mac becoming a platform accountable to requirements — provided Fact (Runner) and Context/permissions (MCP) already stand. Otherwise you only automate manual stringing; the org still will not merge.
L5 series: from decision to first autonomous task
| Part | qid | Topic | Status |
|---|---|---|---|
| ① · this page | L5-Q01 | Tool collection → Agent platform (decision R1) | Published |
| ② | L5-Q02 | Install OpenHands on Cloud Mac + first autonomous task | Next |
| ③ | L5-Q03 | OpenHands vs OpenClaw division in depth | Planned |
| ④ | L5-Q04 | Runner + OpenHands: auto PR after CI failure? | Planned |
| ⑤ · L6 extension | L6-Q05 | Agent Ops / Governance (Context→Workflow→policy) | 📅 6/16 |
Before the L6 loop closes, finish at least ②: without a reproducible OpenHands tutorial, the Hub lacks a landing page. Full stack map at L6-Q01; the series evolves from “AI Tool Stack” to “AI Engineering Platform,” with L6-Q05 Agent Governance as the final ring.
FAQ
How does OpenHands work / what is the agent loop?
Plan → Execute → Observe → Debug; consumes Context, produces Diff, validates Fact — see how it works.
OpenHands vs Claude Code — which to pick?
Not either/or. Diff with Claude Code, multi-step issues with OpenHands — see OpenHands vs Claude Code.
OpenHands tutorial / how to install?
This Hub has no step-by-step install. Docker + first issue→PR in L5-Q02 (planned 6/14).
Can OpenHands self-host on Mac?
Yes. Always-on Cloud Mac + Docker sandbox; architecture at typical architecture.
OpenHands without Runner?
It runs; we do not recommend it. Establish L1 first.
What about OpenClaw?
Orchestration vs autonomous engineering — see triangle roles and OpenClaw notes.
What Cloud Mac size?
See L5 Agent Stack sizing; OpenHands only M4 16GB; with Claude Code or Ollama 7B prefer 24GB.
L5 Agent Stack · sizing
Pick Cloud Mac specs for your workload
OpenHands Only → M4 16GB · + Claude Code → M4 24GB · + Ollama 14B + Runner → M4 Pro 48GB. Clarify the Workflow layer in this Hub, then run your first autonomous task in the L5-Q02 tutorial.
View Cloud Mac pricing and specs