Skip to content

Loop Factory

1) Executive Summary

A software factory is an engineered system that converts requirements into reviewed pull requests through repeatable agentic loops. The key shift is from “AI as autocomplete” to AI as a production process: scoped work intake, execution loops, quality gates, and explicit handoff to humans.

For small teams (1–3 people), this matters because factories convert scarce founder time into throughput. The leverage does not come from one model prompt; it comes from:

  • strict task scoping,
  • fresh context per iteration,
  • file-based state that survives context resets,
  • and hard backpressure (tests/types/lints/CI).

Without these, teams get fast output but low reliability. With these, teams can run sustained autonomous work while preserving quality and control.

flowchart LR
  A[Specs / Issues] --> B[Planner]
  B --> C[One-task execution loop]
  C --> D[Backpressure: tests/lint/types]
  D -->|pass| E[PR creation]
  D -->|fail| C
  E --> F[Human review]
  F --> G[Merge / deploy]

2) The Landscape

Ralph Loop (Geoffrey Huntley)

The canonical minimal pattern: bash loop, fresh context each run, file-based memory, one-task-per-loop. Huntley’s framing is intentionally monolithic: avoid premature multi-agent orchestration complexity; maximize determinism via simple control flow and persistent artifacts (specs/*, IMPLEMENTATION_PLAN.md, AGENTS.md).

Core contribution: - The pattern is simple enough to implement (~300 LOC class of systems), - but robust if you enforce backpressure and specs-first execution.

Loom (Geoffrey Huntley)

Loom is infrastructure for evolutionary software at organizational scale: Rust server, Kubernetes weavers, ephemeral environments, integrated git hosting, security/audit sidecars, analytics/flags/auth stack.

Core contribution: - transforms Ralph-style loop logic into platform primitives for large-scale orchestration and governance.

Tradeoff: - significant infra and operational weight; not a first system for small teams.

Gastown (Steve Yegge)

Gastown is “K8s for agents”: role-separated orchestration (Mayor/Deacon/Witness/Refinery/Polecats), persistent work identity via Beads, convoy workflows, and GUPP (“if work is on your hook, run it”).

Core contribution: - high-throughput multi-rig, multi-agent coordination with recovery semantics and operational roles.

Tradeoff: - designed for advanced users running many concurrent agents; overhead is substantial for small teams.

GSD (glittercowboy)

GSD is a context-engineering workflow system for Claude Code centered on .planning/ artifacts (PROJECT.md, ROADMAP.md, STATE.md) and atomic plans with fresh subagent contexts.

Core contribution: - practical structure for preventing context rot without heavy infra.

Tradeoff: - excellent planning discipline does not guarantee execution quality if the implementation model is weak or under-constrained.

Stripe Minions

Production enterprise system running at very high throughput (1300+ PRs/week class). Built around: - isolated devboxes, - a hardened Goose-based harness, - Toolshed MCP (400+ internal tools, curated subsets), - Blueprints combining deterministic and agentic nodes, - one-shot runs with bounded retry policy (max 2 CI rounds).

Core contribution: - demonstrates reliable autonomous coding in production when orchestration and backpressure are engineered as first-class concerns.

3) Common Principles Across Successful Systems

Across Ralph, GSD, Gastown, Loom, and Minions, the winning invariants are consistent:

  1. Specs-first: requirements before code generation.
  2. Fresh context per iteration: avoid long-lived context drift.
  3. File-based persistent state: plans, memory, runbooks on disk.
  4. Backpressure: tests/lints/types/CI as rejection filters.
  5. One task per iteration: constrain scope for reliability.
  6. Human review: autonomy in execution, human authority at merge.
  7. Isolation/sandboxing: bounded blast radius for agent actions.
graph TD
  P[Persistent specs/state files] --> L[Loop run in fresh context]
  L --> T[Single scoped task]
  T --> V[Verification gates]
  V -->|fail| L
  V -->|pass| R[Reviewable PR]
  R --> H[Human decision]

4) Comparison Table

Approach Complexity Strengths Weaknesses Ideal Team Size Primary Language/Stack
Ralph Loop Low Minimal, fast to adopt, clear mental model, strong with good specs/backpressure Limited orchestration features; manual scaling patterns 1–5 Bash + any coding model/tooling
Loom Very High Enterprise platformization, ephemeral envs, auditability, integrated control plane Heavy infra/ops burden 20+ agents / platform teams Rust + K8s + web stack
Gastown Very High Rich multi-agent orchestration, role separation, resilient work routing Steep learning curve, high operational overhead 20+ agents / advanced operators Go + tmux + Beads ecosystem
GSD Medium Strong planning/state discipline, context-rot mitigation, wave parallelization Planning can outpace execution quality if implementation loop is weak 1–10 Claude Code command framework
Stripe Minions Very High Proven production throughput, blueprint orchestration, robust unattended runs Requires mature internal platform and tool ecosystem Large enterprises Custom harness + devboxes + MCP tooling

5) What Works for Small Teams (1–3 people)

Practical recommendations:

  1. Use Ralph loop as the foundation
  2. Keep loop driver simple.
  3. Enforce one-task-per-iteration.
  4. Store all durable memory in files.

  5. Add GSD-style planning structure

  6. Adopt .planning/ artifacts to improve continuity and handoff quality.
  7. Keep plans atomic.

  8. Adopt Stripe’s Blueprint idea early

  9. Interleave deterministic and agentic steps.
  10. Deterministic nodes: setup, lint, typecheck, tests, PR formatting.
  11. Agentic nodes: design/implementation/fix strategy.

  12. Avoid Loom/Gastown complexity until scale demands it

  13. They are powerful but generally overkill before sustained 20+ concurrent agent operations.

6) Our Approach: The Agentmaker Factory

Target architecture for Agentmaker:

  • Codex 5.3 for implementation (focus, code quality, speed).
  • Claude Opus 4.6 for testing/review and browser-heavy validation.
  • OpenClaw sessions_spawn as dispatch/orchestration mechanism.
  • GitHub Issues as the work queue and source of truth for scope.
  • Nightly autonomous runs for throughput.
  • Daytime human review for merge governance.
flowchart TB
  I[GitHub Issues] --> D[OpenClaw dispatcher\n(sessions_spawn)]
  D --> C[Codex 5.3 implementer loops]
  C --> G[Deterministic gates\nformat/lint/type/test]
  G --> O[Claude Opus 4.6\nreview + browser testing]
  O --> PR[PRs ready for human review]
  PR --> M[Human merge decisions (daytime)]
  M --> I

Implementation guideline: - Build the first version as a Ralph+Blueprint hybrid with minimal infra. - Scale out orchestration only after queue pressure, not before.

7) Key Lessons from Steve’s GSD Experience

Observed lesson pattern:

  • GSD-style systems can excel at planning and decomposition,
  • but execution can degrade when agents lose implementation focus,
  • resulting in diffuse edits, random low-quality code, and missed intent.

Practical takeaway: - Use a focused implementation model (Codex) for code generation. - Use Claude primarily where it is strongest in this workflow: test design, adversarial review, browser validation, and issue finding.

In other words: keep planning scaffolds, but separate planning capability from coding reliability.

8) Pi — The Minimal Agentic Harness

While the systems above focus on orchestration (how to schedule, loop, and review), there's a separate question: what is the actual agent harness? What executes the tools, manages the context window, and talks to the LLM?

What is Pi?

Pi (by Mario Zechner / @badlogic) is a minimal, opinionated coding agent harness. Its philosophy: primitives, not features. Everything that other agents bake in (sub-agents, plan mode, MCP, permissions) is either an extension you build or a package you install. The core stays small.

Pi is structured as four packages:

Package Purpose
pi-ai Unified LLM API — 15+ providers, streaming, tool calling, cross-provider context handoff, abort support
pi-agent-core The agent loop — tool execution, validation, event streaming
pi-coding-agent The CLI harness — sessions, AGENTS.md, skills, extensions, themes
pi-tui Terminal UI framework — differential rendering, flicker-free

Pi's Design Decisions

Decision Rationale
No MCP CLI tools with READMEs (skills) are simpler and don't bust prompt cache
No sub-agents Spawn pi instances via tmux, or build your own with extensions
No plan mode Write plans to files. Install a package if you want structure
No permission popups Run in a container. YOLO by default
No background bash Use tmux. Full observability, direct interaction
Minimal system prompt You control what goes into context. No injection behind your back

Key Innovation: Context Engineering as First-Class Citizen

Pi's context management is what separates it from Claude Code and Cursor:

  • AGENTS.md — project instructions loaded at startup (identical to Ralph/GSD pattern)
  • SYSTEM.md — replace or append to the system prompt per-project
  • Compaction — auto-summarizes older messages; fully customizable via extensions
  • Skills — on-demand capability packages with progressive disclosure
  • Dynamic context — extensions can inject messages before each turn, filter history, implement RAG

This maps directly to Huntley's "context is ephemeral, state is persistent" principle. Pi just makes it programmable.

oh-my-pi (can1357's Fork)

oh-my-pi extends pi with features that directly address factory needs:

  • LSP integration — format-on-write + diagnostics on every file change = backpressure built into the harness
  • TTSR (Time-Traveling Streamed Rules) — zero-context-cost rules that inject only when the model's output matches a regex pattern. One-shot per session.
  • Task tool — parallel subagent system with isolated git worktrees and real-time artifact streaming
  • Model rolesdefault, smol, slow, plan, commit for automatic cost-based routing
  • Python tool — persistent IPython kernel for data analysis within agent sessions

Pi as Factory Foundation

Pi is the agent harness underneath OpenClaw (which powers our entire operation). OpenClaw depends on all four pi packages. When we spawn sub-agents via sessions_spawn, it's pi's agent loop executing. The tools (Read, Edit, exec) are pi tools. AGENTS.md is pi's pattern.

OpenClaw (messaging, cron, channels, gateway)
  └── pi-coding-agent (harness, sessions, extensions, skills)
       └── pi-agent-core (agent loop, tool execution)
            └── pi-ai (multi-provider LLM API, context handoff)

This means our factory doesn't need a separate harness. We already have one. What we need is:

  1. Better AGENTS.md files in each repo (operational, with build/test commands)
  2. Specs as the work queue (already exist)
  3. A dispatch layer that picks spec → hydrates context → spawns agent
  4. Tests as backpressure (the missing piece)

Pi vs Other Harnesses

Feature Pi Claude Code Codex CLI Goose (Stripe)
Context control Full (you own it) Limited (injects behind back) Moderate Custom fork
Extensions TypeScript modules Slash commands N/A Internal
Multi-provider 15+ with handoff Anthropic only OpenAI only Custom
Sub-agents Extension/tmux Built-in N/A N/A
LSP (oh-my-pi) Built-in backpressure N/A N/A N/A
SDK/RPC Yes (embed in apps) N/A N/A Internal
Session format Tree-structured, shareable Opaque Opaque N/A

References


9) References

Primary source material analyzed:

  • docs/content/research/agentic-loop-synthesis.md
  • docs/content/ghuntley/deepwiki/loom.md
  • docs/content/ghuntley/deepwiki/how-to-ralph-wiggum.md
  • docs/content/ghuntley/blog/2026-01-17-everything-is-a-ralph-loop.md
  • docs/content/ghuntley/blog/2026-01-17-backpressure.md
  • docs/content/ghuntley/blog/2025-08-24-how-to-build-coding-agent.md
  • docs/content/ghuntley/blog/2026-01-22-specs-groundhog.md
  • docs/content/steve-yegge/deepwiki/gastown.md
  • docs/content/steve-yegge/blog/2026-01-01-welcome-to-gas-town.md
  • docs/content/glittercowboy/deepwiki/get-shit-done.md
  • docs/content/stripe/blog/2026-02-09-minions-part-1.md
  • docs/content/stripe/blog/2026-02-19-minions-part-2.md
  • docs/content/indydevdan/deepwiki/infinite-agentic-loop.md
  • docs/content/README.md

Repository roots:

  • Loop research corpus: /root/clawd/repos/loop/docs/content/
  • Site docs target: /root/clawd/repos/friday-workspace/site-docs/loop-factory/index.md