OpenHands: from tool collection to Agent platform on Cloud Mac

When Claude Code can edit a file and Runner can prove one build, who finishes the whole requirement?
Series slogan: Claude Code produces Diff; GitHub Runner produces Fact; OpenHands produces Workflow.

Cloud Mac AI Stack · L5 Hub  ·  2026.06.06  ·  ~14 min read  ·  architecture hub · OpenHands tutorial in L5-Q02

OpenHands autonomous agent workflow and multi-step software engineering tasks on Cloud Mac

Over the past two weeks in this Stack series we stood up L1 Runner (Fact), L2 Ollama (Inference), and L4 MCP (Context) layer by layer. Reader feedback keeps repeating one line: “Every tool is connected, but I still manually string the workflow every day.” Claude Code can produce a diff, MCP can pull GitHub context, Runner can go green after push — yet “fix issue #142 and open a PR” still means someone staring at a terminal for forty minutes.

That is what L5 · OpenHands answers: not another CLI purchase, but upgrading Cloud Mac from a tool collection into an Agent platform that can autonomously finish multi-step engineering tasks. This page is L5-Q01 · R1 · series Hub: it moves readers from “coding tools” to “Agent platform” thinking — where Workflow sits in the Stack, why OpenHands vs Claude Code is not a replacement story, typical tasks, and OpenHands self-hosted on macOS architecture. No Docker install steps here (that is the L5-Q02 SEO landing page).

L5
Workflow layer
4
Step agent loop
24GB
Suggested RAM with Ollama

Cloud Mac AI Stack · series slogan (fourth ring)

Claude Code produces Diff; GitHub Runner produces Fact; OpenHands produces Workflow.

MCP supplies Context; Ollama supplies optional Inference. Workflow consumes Context / Diff / Fact and calls the latter two repeatedly in a loop — not a one-way pipeline. See Stack language.

The “tool collection” trap: every piece works; the chain still runs on humans

A typical week on site (we have seen this shape in customer repos many times):

  1. Monday: Claude Code edits the API layer, MCP pulls the GitHub issue list — smooth inside the session.
  2. Tuesday: a teammate pushes from another machine; Runner goes red — nobody aligned Agent edits with CI scripts (without reading the L1 execution engine, this repeats).
  3. Wednesday: manual test runs, config tweaks, another Claude Code session to patch files.
  4. Thursday: checks finally green, but docs, migration scripts, sample tests are still missing — because “edit code” and “deliver the requirement” were treated as the same job.

A tool collection means: each step has a best tool, but no layer owns the whole requirement. An Agent platform adds a Workflow layer that can decompose tasks, execute, and retry from failure on its own — OpenHands is the open-source option in the Stack for that layer (evolved from the OpenDevin ecosystem).

Stack language: Workflow with Context / Diff / Fact

Series-wide notation: do not draw Workflow as a one-way downstream of Fact. Workflow (L5) is the orchestration layer that repeatedly consumes Context, produces Diff, and validates with Fact until it decides “requirement done”:

Cloud Mac AI Stack · output relationships (not call order)

  Workflow (L5 · OpenHands)
  ├── Context (L4 · MCP)          ← read repo / issue / API
  ├── Diff (L3 · Claude Code etc) ← edit code / write files
  └── Fact (L1 · Runner / tests)  ← run test / build / CI signal

Agent loop (inside Workflow · may iterate many times)
       Diff  ↔  Fact
         ↑       ↓
      Observe → re-Plan → re-Execute …

Four outputs to remember: Context · Diff · Fact · Workflow (MCP · coding layer · Runner · OpenHands). Inference (L2 · Ollama) is optional and omitted above to avoid confusion with the Agent loop.

Layer Component Output Question answered
L4 MCP Context What can the Agent see?
L3 Claude Code Diff What is this change?
L1 GitHub Runner Fact Will the org trust it?
L5 OpenHands Workflow Is the whole requirement done?

Workflow is not “another CI job” but a multi-step, interruptible and resumable task state machine: it calls Diff and Fact many times in a loop until the PR is deliverable. Claude Code excels at single-round Diff; OpenHands excels at running the whole loop unattended — provided Context and Fact are already in place.

OpenHands in one minute (not an encyclopedia)

OpenHands is an open-source autonomous software engineering Agent platform: in a sandbox (often Docker) it accepts natural-language goals and automatically plans → writes code / runs commands → reads output → debugs, with GitHub integration (issues, PRs, CI status). In the Cloud Mac AI Stack it does not replace Claude Code’s paired experience or Runner’s objective build proof — it orchestrates multi-step delivery on top of both.

Different from “install another MCP Server”

MCP extends the context boundary (read repo, call APIs); OpenHands extends task depth (decide the next tool call, whether to retry). Without L4, OpenHands edits blind; without L1, OpenHands “done” cannot be accepted by the org.

OpenHands vs Claude Code: why they are not competitors

People searching OpenHands vs Claude Code or Claude Code alternative often ask: can one replace the other? In the Cloud Mac AI Stack the answer is no, and it should not — they sit on different layers with different outputs:

  • Claude Code (L3) → produces Diff: paired coding, you are present.
  • OpenHands (L5) → produces Workflow: autonomous agent, you set the goal.

Treating OpenHands as “another Claude Code” fails fast: OpenHands does not align fuzzy product intuition in your head; Claude Code does not unattended-run an eight-step issue. The right pattern is stacked use — Claude Code for hard problems by day, OpenHands clearing the issue queue at night.

Dimension Claude Code (L3 · Diff) OpenHands (L5 · Workflow)
Interaction Human present, step-by-step confirm Goal-driven, multi-step autonomy
Typical duration 5–30 minute session 30 minutes to hours per task
Strength Complex single-point refactor, align intent Scripted requirements, batch small changes, templated features
Risk Session ends → partial work Runaway edits, too many files, excess permissions
Output Diff Workflow (PR, logs, step trajectory)
OpenHands alternative? ❌ No ❌ Not a Claude Code replacement

Rule of thumb (not a contract): if you can state the change intent in one PR sentence → Claude Code; if you say “finish the issue” → consider OpenHands. In large repos with CodeGraph indexing, the paired layer often stays Claude Code; OpenHands fits templatized backend tasks. Both share MCP Context but not the same responsibility.

What tasks can OpenHands do? (the first search question)

Many people search OpenHands agent or OpenHands github to ask: what work is reliable to hand off? Below are task types we suggest trying first with tests + CI — also the example pool for the L5-Q02 tutorial:

Task type Typical input Expected delivery Fit
Fix bug GitHub issue + repro steps Patch + tests + PR ⭐⭐⭐⭐
Dependency upgrade “Upgrade React 18→19” Lockfile + breaking-change fixes ⭐⭐⭐⭐
Lint cleanup ESLint / SwiftLint report Batch fix warnings, no behavior change ⭐⭐⭐⭐⭐
Generate tests Uncovered module list Unit test PR ⭐⭐⭐
Documentation sync API change diff README / OpenAPI sync ⭐⭐⭐⭐
Scaffold / boilerplate “Add REST endpoint” template Routes + test skeleton ⭐⭐⭐⭐

Poor first OpenHands tasks: untested large refactors, major UX redesign, schema migrations needing business approval, anything touching production secrets. Keep those in Claude Code paired sessions; let Runner produce Fact after human gate.

How OpenHands works

People searching How OpenHands works, OpenHands architecture, or OpenHands agent loop want to know: how does an autonomous agent turn one sentence into a mergeable PR? OpenHands centers on a four-step loop — often written Plan → Execute → Observe → Debug:

Phase What it does Consumes
Plan Read issue, split subtasks, file list Context (MCP, GitHub, repo tree)
Execute Write patches, run shell, call tools Produces Diff
Observe Read test output, lint, build logs Consumes Fact (local test or Runner)
Debug Revise plan or code from Observe Back to Execute; loop until pass
OpenHands agent loop (concept · not a single pipeline)

        ┌──────────┐
        │   Plan   │  ← Context
        └────┬─────┘
             ▼
        ┌──────────┐
        │ Execute  │  → Diff
        └────┬─────┘
             ▼
        ┌──────────┐
        │ Observe  │  ← Fact (test / build / CI)
        └────┬─────┘
             │
      fail   │  pass
             ▼
        ┌──────────┐        ┌─────────────┐
        │  Debug   │ ──────▶│ Workflow done│ → PR / delivery
        └────┬─────┘        └─────────────┘
             │
             └──── back to Plan or Execute (next round)

This is not “install a stronger Chat.” OpenHands architecture hinges on a stateful task machine — each Observe result is written into trajectory for the next Plan. Fix bug, lint cleanup, and similar tasks are the same loop with different Plan entry sentences. Real task replay below walks one issue through all four steps; Docker and UI config land in L5-Q02.

Stack L0–L4 before L5 — or the Agent performs in a sandbox alone

We oppose “install OpenHands on day one” tool stacking. Recommended order matches the L1 rollout sequence, with L5 after MCP:

  1. L0 — Always-on Cloud Mac macOS node.
  2. L1 — Runner: repeatable push → green/red.
  3. L2–L3 — Optional Ollama + Claude Code coding on top of Fact.
  4. L4 — MCP Hub + permission model: auditable read/write for Agents.
  5. L5 — OpenHands: multi-step Workflow.

Without L1, OpenHands can technically run and open PRs, but the team cannot judge merge risk — the same org incident as “Claude Code SSH all green, Actions all red.” Without L4 permissions, autonomous Agent token exposure grows; see the MCP security spec.

Real task replay: one full Plan → Execute → Observe → Debug round

Below is a task shape we replay on an OpenHands sandbox fork (numbers illustrative). Match each step to the agent loop:

Goal: fix issue #218 "CSV export missing UTF-8 BOM"

  ① Read issue + related src/export/*.ts     ~2 min · Context (MCP/Git)
  ② Generate 6-step plan                     ~1 min · Plan
  ③ Edit 4 files + add 1 test                ~8 min · Execute
  ④ Run pnpm test → fail (snapshot mismatch) ~3 min · Observe · Fact
  ⑤ Read logs → edit 2 more files            ~5 min · Debug → re-Execute
  ⑥ Re-run tests → green                     ~3 min
  ⑦ Open PR, link issue                      ~1 min · Workflow delivery

~23 min wall time · human: approve goal + final merge review only

Note step ④ (Observe): test failure is not disaster — it is agent loop input. In paired tools you fix on the spot; OpenHands feeds Fact back through Observe → Debug into the next Execute. Without a stable test command (L1 not solid), Observe has no signal and the loop spins — another reason Runner comes before OpenHands.

Trigger-side concept (not an install tutorial — only how it connects to GitHub):

# Concept: issue label triggers autonomous task (pseudocode)
on:
  issues:
    types: [labeled]
if: github.event.label.name == 'agent:openhands'
run: |
  openhands run \
    --repo "${{ github.repository }}" \
    --issue "${{ github.event.issue.number }}" \
    --max-iterations 40 \
    --sandbox docker

Runner · OpenClaw · OpenHands: three names, three roles

The triangle we get asked about most on L5 articles:

Component Stack layer Metaphor Typical action
GitHub Runner L1 · Fact Legs xcodebuild, pnpm test, archive
OpenClaw Orchestration (not main Stack tier) Dispatch desk Trigger order, receipts, audit, ACK
OpenHands L5 · Workflow Autonomous engineer Read requirement, edit code, iterate to PR

OpenClaw does not make architecture decisions — it answers “when to run, how to notify when done.” OpenHands does not sign iOS packages for Runner — it produces reviewable PRs and step logs. All three can stack on one Cloud Mac, but do not merge their duties into one runbook.

Typical OpenHands architecture on Cloud Mac (self-hosted · Docker · macOS)

Engineers searching OpenHands Mac, OpenHands macOS, OpenHands self-hosted, or OpenHands Docker want a deployable topology — not install steps, but “where components live.” Our recommended minimum production shape on Apple Silicon Cloud Mac:

OpenHands self-hosted on macOS (Cloud Mac · L0 base)

  GitHub (issues / webhooks / PR)
           │
           ▼
  ┌─────────────────────────────────────┐
  │  Cloud Mac · macOS · Apple Silicon   │
  │  ┌─────────────┐  ┌───────────────┐  │
  │  │ OpenHands   │  │ Claude Code   │  │  L5 Workflow + L3 Diff (same host OK)
  │  │ (Docker)    │  │ (SSH/terminal)│  │
  │  └──────┬──────┘  └───────────────┘  │
  │         │ sandbox workspace           │
  │         ▼                             │
  │  ┌─────────────┐  ┌───────────────┐  │
  │  │ MCP Servers │  │ Ollama (opt)  │  │  L4 Context · L2 Inference
  │  └─────────────┘  └───────────────┘  │
  │         │ git push                    │
  │         ▼                             │
  │  ┌─────────────────────────────────┐  │
  │  │ GitHub Runner (self-hosted)     │  │  L1 Fact
  │  └─────────────────────────────────┘  │
  └─────────────────────────────────────┘

Why Workflow on Cloud Mac, not a laptop?

  • Duration — OpenHands tasks often run 30–90 minutes; lid-closed laptop breaks them.
  • OpenHands Docker — sandbox needs a stable daemon; 24/7 Cloud Mac fits better.
  • Same stack as Runner — Agent edits → Runner validates on the same macOS node, fewer “SSH green, Actions red” cases.
  • ABI alignment — iOS / macOS target repos on Apple Silicon beat forcing Docker on a Linux VPS.

Minimal OpenHands Docker start shape (concept snippet; full docker compose and env in L5-Q02 tutorial):

# Cloud Mac · OpenHands self-hosted (illustrative)
docker pull docker.all-hands.dev/all-hands-ai/openhands:0.9
docker run -d --name openhands \
  -e SANDBOX_USER_ID=$(id -u) \
  -v /var/run/docker.sock:/var/run/docker.sock \
  -v $HOME/.openhands:/.openhands \
  -p 3000:3000 \
  docker.all-hands.dev/all-hands-ai/openhands:0.9

Dedicated node ≠ automatic safety

Cloud Mac solves compute and macOS ABI; OpenHands still needs repo-level least privilege (bot branches, no prod secrets). Future L6 Agent Ops / Governance will cover audit, policy, and human gates — this page establishes Workflow; governance is the next ring in the series schedule.

L5 Agent Stack sizing: from Workflow to which Cloud Mac to rent

After architecture, the natural question: what machine for the Workflow layer? Sizing from real stacked workloads (not contract SLA — to shorten decisions):

Scenario Suggested config Notes
OpenHands Only (light issues, no local inference) M4 · 16GB Docker sandbox + API LLM; good to trial agent workflows
OpenHands + Claude Code (paired + autonomous same host) M4 · 24GB Diff by day, Workflow by night; avoid CI memory fights
OpenHands + Ollama 7B M4 · 24GB Private Inference + Agent; see off-peak scheduling
OpenHands + Ollama 14B + Runner M4 Pro · 48GB 14B resident + sandbox + daily macOS CI; lowest Swap risk
iOS team (OpenHands issues + xcodebuild CI) M4 · 24GB+ Agent and Runner co-located; reserve 8GB+ for archive peaks

This is the commercial narrative Workflow → Cloud Mac → sizing: confirm you need L5 capability first, then pick a node that holds Docker + (optional) Ollama + Runner — not rent hardware then stack tools backward.

Fit and misfit: boundaries matter more than Agent hype

Better for OpenHands (L5) Poor fit / use caution
Internal tools, scaffolding, docs sites, test backfill Regulated finance / healthcare core paths without human gate
Repos with clear issue templates and decent test coverage Repos with no tests, no CI — “ship first”
Repeatable migrations (dependency upgrades, lint batch fixes) Major UX needing strong product intuition
Existing L1 Runner + L4 MCP permission policy Secrets scattered in repo, no token rotation
Team accepts “Agent PR + human merge” Agent must push main / auto-release prod

Scripts and small services: yes; unguarded compliant prod writes: no. OpenHands is an engineering accelerator, not a liability-free “auto DevOps.”

Decision: should your Cloud Mac upgrade to an Agent platform?

Self-check below — hit ≥3 left-column rows before investing in L5; otherwise shore up L1/L4 first.

Ready for OpenHands Not yet
≥5 “small but complete” issues queued per week Main pain is “no macOS CI”
Runner green but lots of manual step stringing Claude Code sessions still unstable
MCP permissions and bot accounts tiered GitHub PAT with full repo admin
Willing to maintain sandbox and task logs No one does merge review
Cloud Mac 24GB or Ollama off-peak scheduled 16GB running 14B + Agent + Xcode together

Decision (not a summary): OpenHands value is not “smarter Chat” but Cloud Mac becoming a platform accountable to requirements — provided Fact (Runner) and Context/permissions (MCP) already stand. Otherwise you only automate manual stringing; the org still will not merge.

L5 series: from decision to first autonomous task

Part qid Topic Status
· this page L5-Q01 Tool collection → Agent platform (decision R1) Published
L5-Q02 Install OpenHands on Cloud Mac + first autonomous task Next
L5-Q03 OpenHands vs OpenClaw division in depth Planned
L5-Q04 Runner + OpenHands: auto PR after CI failure? Planned
· L6 extension L6-Q05 Agent Ops / Governance (Context→Workflow→policy) 📅 6/16

Before the L6 loop closes, finish at least ②: without a reproducible OpenHands tutorial, the Hub lacks a landing page. Full stack map at L6-Q01; the series evolves from “AI Tool Stack” to “AI Engineering Platform,” with L6-Q05 Agent Governance as the final ring.

FAQ

How does OpenHands work / what is the agent loop?
Plan → Execute → Observe → Debug; consumes Context, produces Diff, validates Fact — see how it works.

OpenHands vs Claude Code — which to pick?
Not either/or. Diff with Claude Code, multi-step issues with OpenHands — see OpenHands vs Claude Code.

OpenHands tutorial / how to install?
This Hub has no step-by-step install. Docker + first issue→PR in L5-Q02 (planned 6/14).

Can OpenHands self-host on Mac?
Yes. Always-on Cloud Mac + Docker sandbox; architecture at typical architecture.

OpenHands without Runner?
It runs; we do not recommend it. Establish L1 first.

What about OpenClaw?
Orchestration vs autonomous engineering — see triangle roles and OpenClaw notes.

What Cloud Mac size?
See L5 Agent Stack sizing; OpenHands only M4 16GB; with Claude Code or Ollama 7B prefer 24GB.

L5 Agent Stack · sizing

Pick Cloud Mac specs for your workload

OpenHands Only → M4 16GB · + Claude Code → M4 24GB · + Ollama 14B + Runner → M4 Pro 48GB. Clarify the Workflow layer in this Hub, then run your first autonomous task in the L5-Q02 tutorial.

View Cloud Mac pricing and specs
L5 Agent Stack From M4 24GB