Loop Factory

1) Executive Summary

A software factory is an engineered system that converts requirements into reviewed pull requests through repeatable agentic loops. The key shift is from “AI as autocomplete” to AI as a production process: scoped work intake, execution loops, quality gates, and explicit handoff to humans.

For small teams (1–3 people), this matters because factories convert scarce founder time into throughput. The leverage does not come from one model prompt; it comes from:

strict task scoping,
fresh context per iteration,
file-based state that survives context resets,
and hard backpressure (tests/types/lints/CI).

Without these, teams get fast output but low reliability. With these, teams can run sustained autonomous work while preserving quality and control.

flowchart LR
  A[Specs / Issues] --> B[Planner]
  B --> C[One-task execution loop]
  C --> D[Backpressure: tests/lint/types]
  D -->|pass| E[PR creation]
  D -->|fail| C
  E --> F[Human review]
  F --> G[Merge / deploy]

2) The Landscape

Ralph Loop (Geoffrey Huntley)

The canonical minimal pattern: bash loop, fresh context each run, file-based memory, one-task-per-loop. Huntley’s framing is intentionally monolithic: avoid premature multi-agent orchestration complexity; maximize determinism via simple control flow and persistent artifacts (specs/*, IMPLEMENTATION_PLAN.md, AGENTS.md).

Core contribution: - The pattern is simple enough to implement (~300 LOC class of systems), - but robust if you enforce backpressure and specs-first execution.

Loom (Geoffrey Huntley)

Loom is infrastructure for evolutionary software at organizational scale: Rust server, Kubernetes weavers, ephemeral environments, integrated git hosting, security/audit sidecars, analytics/flags/auth stack.

Core contribution: - transforms Ralph-style loop logic into platform primitives for large-scale orchestration and governance.

Tradeoff: - significant infra and operational weight; not a first system for small teams.

Gastown (Steve Yegge)

Gastown is “K8s for agents”: role-separated orchestration (Mayor/Deacon/Witness/Refinery/Polecats), persistent work identity via Beads, convoy workflows, and GUPP (“if work is on your hook, run it”).

Core contribution: - high-throughput multi-rig, multi-agent coordination with recovery semantics and operational roles.

Tradeoff: - designed for advanced users running many concurrent agents; overhead is substantial for small teams.

GSD (glittercowboy)

GSD is a context-engineering workflow system for Claude Code centered on .planning/ artifacts (PROJECT.md, ROADMAP.md, STATE.md) and atomic plans with fresh subagent contexts.

Core contribution: - practical structure for preventing context rot without heavy infra.

Tradeoff: - excellent planning discipline does not guarantee execution quality if the implementation model is weak or under-constrained.

Stripe Minions

Production enterprise system running at very high throughput (1300+ PRs/week class). Built around: - isolated devboxes, - a hardened Goose-based harness, - Toolshed MCP (400+ internal tools, curated subsets), - Blueprints combining deterministic and agentic nodes, - one-shot runs with bounded retry policy (max 2 CI rounds).

Core contribution: - demonstrates reliable autonomous coding in production when orchestration and backpressure are engineered as first-class concerns.

3) Common Principles Across Successful Systems

Across Ralph, GSD, Gastown, Loom, and Minions, the winning invariants are consistent:

Specs-first: requirements before code generation.
Fresh context per iteration: avoid long-lived context drift.
File-based persistent state: plans, memory, runbooks on disk.
Backpressure: tests/lints/types/CI as rejection filters.
One task per iteration: constrain scope for reliability.
Human review: autonomy in execution, human authority at merge.
Isolation/sandboxing: bounded blast radius for agent actions.

graph TD
  P[Persistent specs/state files] --> L[Loop run in fresh context]
  L --> T[Single scoped task]
  T --> V[Verification gates]
  V -->|fail| L
  V -->|pass| R[Reviewable PR]
  R --> H[Human decision]

4) Comparison Table

Approach	Complexity	Strengths	Weaknesses	Ideal Team Size	Primary Language/Stack
Ralph Loop	Low	Minimal, fast to adopt, clear mental model, strong with good specs/backpressure	Limited orchestration features; manual scaling patterns	1–5	Bash + any coding model/tooling
Loom	Very High	Enterprise platformization, ephemeral envs, auditability, integrated control plane	Heavy infra/ops burden	20+ agents / platform teams	Rust + K8s + web stack
Gastown	Very High	Rich multi-agent orchestration, role separation, resilient work routing	Steep learning curve, high operational overhead	20+ agents / advanced operators	Go + tmux + Beads ecosystem
GSD	Medium	Strong planning/state discipline, context-rot mitigation, wave parallelization	Planning can outpace execution quality if implementation loop is weak	1–10	Claude Code command framework
Stripe Minions	Very High	Proven production throughput, blueprint orchestration, robust unattended runs	Requires mature internal platform and tool ecosystem	Large enterprises	Custom harness + devboxes + MCP tooling

5) What Works for Small Teams (1–3 people)

Practical recommendations:

Use Ralph loop as the foundation
Keep loop driver simple.
Enforce one-task-per-iteration.
Store all durable memory in files.
Add GSD-style planning structure
Adopt .planning/ artifacts to improve continuity and handoff quality.
Keep plans atomic.
Adopt Stripe’s Blueprint idea early
Interleave deterministic and agentic steps.
Deterministic nodes: setup, lint, typecheck, tests, PR formatting.
Agentic nodes: design/implementation/fix strategy.
Avoid Loom/Gastown complexity until scale demands it
They are powerful but generally overkill before sustained 20+ concurrent agent operations.

6) Our Approach: The Agentmaker Factory

Target architecture for Agentmaker:

Codex 5.3 for implementation (focus, code quality, speed).
Claude Opus 4.6 for testing/review and browser-heavy validation.
OpenClaw sessions_spawn as dispatch/orchestration mechanism.
GitHub Issues as the work queue and source of truth for scope.
Nightly autonomous runs for throughput.
Daytime human review for merge governance.

flowchart TB
  I[GitHub Issues] --> D[OpenClaw dispatcher\n(sessions_spawn)]
  D --> C[Codex 5.3 implementer loops]
  C --> G[Deterministic gates\nformat/lint/type/test]
  G --> O[Claude Opus 4.6\nreview + browser testing]
  O --> PR[PRs ready for human review]
  PR --> M[Human merge decisions (daytime)]
  M --> I

Implementation guideline: - Build the first version as a Ralph+Blueprint hybrid with minimal infra. - Scale out orchestration only after queue pressure, not before.

7) Key Lessons from Steve’s GSD Experience

Observed lesson pattern:

GSD-style systems can excel at planning and decomposition,
but execution can degrade when agents lose implementation focus,
resulting in diffuse edits, random low-quality code, and missed intent.

Practical takeaway: - Use a focused implementation model (Codex) for code generation. - Use Claude primarily where it is strongest in this workflow: test design, adversarial review, browser validation, and issue finding.

In other words: keep planning scaffolds, but separate planning capability from coding reliability.

8) Pi — The Minimal Agentic Harness

While the systems above focus on orchestration (how to schedule, loop, and review), there's a separate question: what is the actual agent harness? What executes the tools, manages the context window, and talks to the LLM?

What is Pi?

Pi (by Mario Zechner / @badlogic) is a minimal, opinionated coding agent harness. Its philosophy: primitives, not features. Everything that other agents bake in (sub-agents, plan mode, MCP, permissions) is either an extension you build or a package you install. The core stays small.

Pi is structured as four packages:

Package	Purpose
pi-ai	Unified LLM API — 15+ providers, streaming, tool calling, cross-provider context handoff, abort support
pi-agent-core	The agent loop — tool execution, validation, event streaming
pi-coding-agent	The CLI harness — sessions, AGENTS.md, skills, extensions, themes
pi-tui	Terminal UI framework — differential rendering, flicker-free

Pi's Design Decisions

Decision	Rationale
No MCP	CLI tools with READMEs (skills) are simpler and don't bust prompt cache
No sub-agents	Spawn pi instances via tmux, or build your own with extensions
No plan mode	Write plans to files. Install a package if you want structure
No permission popups	Run in a container. YOLO by default
No background bash	Use tmux. Full observability, direct interaction
Minimal system prompt	You control what goes into context. No injection behind your back

Key Innovation: Context Engineering as First-Class Citizen

Pi's context management is what separates it from Claude Code and Cursor:

AGENTS.md — project instructions loaded at startup (identical to Ralph/GSD pattern)
SYSTEM.md — replace or append to the system prompt per-project
Compaction — auto-summarizes older messages; fully customizable via extensions
Skills — on-demand capability packages with progressive disclosure
Dynamic context — extensions can inject messages before each turn, filter history, implement RAG

This maps directly to Huntley's "context is ephemeral, state is persistent" principle. Pi just makes it programmable.

oh-my-pi (can1357's Fork)

oh-my-pi extends pi with features that directly address factory needs:

LSP integration — format-on-write + diagnostics on every file change = backpressure built into the harness
TTSR (Time-Traveling Streamed Rules) — zero-context-cost rules that inject only when the model's output matches a regex pattern. One-shot per session.
Task tool — parallel subagent system with isolated git worktrees and real-time artifact streaming
Model roles — default, smol, slow, plan, commit for automatic cost-based routing
Python tool — persistent IPython kernel for data analysis within agent sessions

Pi as Factory Foundation

Pi is the agent harness underneath OpenClaw (which powers our entire operation). OpenClaw depends on all four pi packages. When we spawn sub-agents via sessions_spawn, it's pi's agent loop executing. The tools (Read, Edit, exec) are pi tools. AGENTS.md is pi's pattern.

OpenClaw (messaging, cron, channels, gateway)
  └── pi-coding-agent (harness, sessions, extensions, skills)
       └── pi-agent-core (agent loop, tool execution)
            └── pi-ai (multi-provider LLM API, context handoff)

This means our factory doesn't need a separate harness. We already have one. What we need is:

Better AGENTS.md files in each repo (operational, with build/test commands)
Specs as the work queue (already exist)
A dispatch layer that picks spec → hydrates context → spawns agent
Tests as backpressure (the missing piece)

Pi vs Other Harnesses

Feature	Pi	Claude Code	Codex CLI	Goose (Stripe)
Context control	Full (you own it)	Limited (injects behind back)	Moderate	Custom fork
Extensions	TypeScript modules	Slash commands	N/A	Internal
Multi-provider	15+ with handoff	Anthropic only	OpenAI only	Custom
Sub-agents	Extension/tmux	Built-in	N/A	N/A
LSP (oh-my-pi)	Built-in backpressure	N/A	N/A	N/A
SDK/RPC	Yes (embed in apps)	N/A	N/A	Internal
Session format	Tree-structured, shareable	Opaque	Opaque	N/A

References

Pi website — docs and package browser
Mario Zechner's blog post — full rationale for building pi
pi-mono repo — source code
oh-my-pi repo — can1357's fork with LSP, TTSR, tasks
OpenClaw — the messaging/gateway layer built on pi

9) References

Primary source material analyzed:

docs/content/research/agentic-loop-synthesis.md
docs/content/ghuntley/deepwiki/loom.md
docs/content/ghuntley/deepwiki/how-to-ralph-wiggum.md
docs/content/ghuntley/blog/2026-01-17-everything-is-a-ralph-loop.md
docs/content/ghuntley/blog/2026-01-17-backpressure.md
docs/content/ghuntley/blog/2025-08-24-how-to-build-coding-agent.md
docs/content/ghuntley/blog/2026-01-22-specs-groundhog.md
docs/content/steve-yegge/deepwiki/gastown.md
docs/content/steve-yegge/blog/2026-01-01-welcome-to-gas-town.md
docs/content/glittercowboy/deepwiki/get-shit-done.md
docs/content/stripe/blog/2026-02-09-minions-part-1.md
docs/content/stripe/blog/2026-02-19-minions-part-2.md
docs/content/indydevdan/deepwiki/infinite-agentic-loop.md
docs/content/README.md

Repository roots:

Loop research corpus: /root/clawd/repos/loop/docs/content/
Site docs target: /root/clawd/repos/friday-workspace/site-docs/loop-factory/index.md