The Agent-Driven Development Wars: OpenAI vs StrongDM
Two tech giants are racing to eliminate human coding. Their approaches couldn't be more different.
In the span of six months, two major players have emerged with radically different visions for how AI agents will write software. OpenAI's approach keeps humans firmly in the driver's seat, steering autonomous executors. StrongDM's approach removes humans from the code entirely, declaring "code must not be written by humans." The results? Both claim transformative success—but through fundamentally incompatible philosophies.
Origins & Timeline
OpenAI Harness Engineering
- Started: Late August 2025
- Duration: 5 months of active development
- Team: 3 engineers → 7 engineers
- Status: Internal beta with hundreds of daily users
- Tech: Codex CLI powered by GPT-5
StrongDM Factory
- Founded: July 14, 2025
- Founders: Justin McCarthy (CTO), Jay Taylor, Navan Chauhan
- Catalyst: Claude 3.5 Sonnet (Oct 2024)
- Status: Open-sourced specs (506 GitHub stars)
- Published: February 6, 2026
Both projects emerged within weeks of each other in mid-2025, responding to the same inflection point: AI models had finally crossed the threshold where they could compound correctness rather than compound errors over long coding sessions.
Core Philosophy
vs
"Code must not be written by humans." — StrongDM
OpenAI: The Partnership Model
OpenAI's philosophy centers on collaboration. Their mantra—"Humans steer. Agents execute"—positions engineers as architects of environments rather than writers of code. The human role transforms into designing scaffolding, specifying intent, and building feedback loops that enable Codex agents to work reliably.
Engineers at OpenAI still review pull requests (though it's optional), validate outcomes, and translate user feedback into acceptance criteria. They maintain strategic control while delegating tactical execution to autonomous agents.
StrongDM: The Autonomy Model
StrongDM's approach is more radical. They operate under three founding rules:
StrongDM's Three Rules
1. Code must not be written by humans
2. Code must not be reviewed by humans
3. Spend minimum $1,000/day per engineer on tokens
Their guiding question when approaching any task: "Why am I doing this? (implied: the model should be doing this instead)." This principle of "deliberate naivete" aims to remove outdated "Software 1.0" conventions entirely.
StrongDM's vision replaces code review with validation. Once specifications are complete, agents execute end-to-end without human iteration. The system validates against live-like environments and self-corrects dynamically—validation becomes superior to traditional code review for quality assurance.
Scale & Metrics
| Metric | OpenAI | StrongDM |
|---|---|---|
| Lines of Code | ~1 million | Not disclosed |
| Pull Requests | ~1,500 merged | Not disclosed |
| Throughput | 3.5 PRs per engineer per day | Thousands of scenarios per hour |
| Speed Improvement | 1/10th the time vs manual coding | Not quantified |
| Agent Runtime | 6+ hours continuous work (overnight) | Batch-oriented shift work |
| Cost Model | Not disclosed | $1,000/day per engineer minimum |
OpenAI emphasizes practical velocity gains—building in one-tenth the time of traditional development. StrongDM emphasizes economic transformation—tasks previously impossible are now routine, justifying heavy token investment.
Testing & Validation Philosophy
Perhaps nowhere is the philosophical divide more apparent than in testing approaches.
OpenAI: Application Legibility
OpenAI's bottleneck became human QA capacity. Their solution: make everything—UI, logs, metrics—directly legible to Codex.
OpenAI's Observability Stack (Per Git Worktree)
Chrome DevTools Protocol: DOM snapshots, screenshots, navigation
Victoria Logs: LogQL queries for log analysis
Victoria Metrics: PromQL queries for performance metrics
Victoria Traces: TraceQL queries for distributed tracing
Vector: Telemetry collection and routing
The app is bootable per git worktree, so Codex can launch and drive one instance per change. Agents can reproduce bugs, validate fixes, and reason about UI behavior directly. Prompts like "ensure service startup completes in under 800ms" become tractable because agents query the observability stack themselves.
StrongDM: Scenarios Over Tests
StrongDM identified a fundamental problem with traditional testing: agents learn to game the tests. Their solution: scenarios—end-to-end user stories stored outside the codebase, functioning as "holdout datasets."
The Digital Twin Universe (DTU)
StrongDM's most striking innovation is the Digital Twin Universe: behavioral clones of third-party services including Okta, Jira, Slack, Google Docs, Google Drive, and Google Sheets.
The DTU enables:
- Testing at volumes "far exceeding production limits"
- Safely testing dangerous failure modes
- Thousands of scenarios per hour with no rate limits
- No API costs or abuse detection triggers
- Fully reproducible, deterministic test conditions
Building high-fidelity service clones was previously economically unfeasible. StrongDM argues that automation has transformed it into routine operation—exemplifying their "deliberate naivete" principle.
Technical Architecture
OpenAI: Layered Domain Architecture
OpenAI enforces a rigid architectural model where each business domain divides into fixed layers with strictly validated dependency directions.
Code can only depend "forward" through layers. Cross-cutting concerns enter through a single explicit interface: Providers. Custom linters (written by Codex) enforce these rules mechanically, with error messages containing remediation instructions for agents.
OpenAI sometimes reimplements dependencies for agent legibility. Rather than pulling in a generic concurrency package, they built their own map-with-concurrency helper: tightly integrated with OpenTelemetry instrumentation, 100% test coverage, and exactly the behavior their runtime expects.
StrongDM: Graph-Based Pipeline (Attractor)
StrongDM's architecture centers on Attractor, a DOT-based pipeline orchestration system that represents multi-stage AI workflows as directed graphs.
Workflows are defined in Graphviz DOT syntax, providing immediate visualization and version control compatibility. Node types map to handler implementations via shape attributes:
Attractor Node Types
Mdiamond (start) — Entry point
Msquare (exit) — Terminal with goal-gate enforcement
box (default) — LLM task invocation
hexagon — Human-in-the-loop gate
diamond — Conditional routing
component — Parallel fan-out
tripleoctagon — Parallel fan-in
parallelogram — External tool execution
house — Supervisor loop
Deterministic Edge Selection
After each node completes, the engine selects the next edge using a 5-step priority hierarchy:
- Condition-matching edges (evaluate boolean expressions)
- Preferred label match (outcome suggests specific edge)
- Suggested next IDs (outcome recommends target nodes)
- Highest weight (numeric priority attribute)
- Lexical tiebreak (alphabetical node ID ordering)
This ensures reproducible routing without runtime randomness—critical for debugging and audit trails.
State Management
Attractor maintains three levels of state:
- Context: Thread-safe key-value store shared across pipeline stages
- Checkpoints: Serialized execution state enabling crash recovery and exact resume
- Artifacts: Large outputs (>100KB) are file-backed for efficiency
Multi-Provider Flexibility
StrongDM embraces provider-specific tooling rather than forcing normalization:
- OpenAI models use
apply_patchformat - Anthropic models use
edit_filewith exact-string matching - Gemini models use their specific conventions
A CSS-like "Model Stylesheet" centralizes LLM configuration:
* { llm_model: claude-sonnet-4-5; }
.critical { llm_model: gpt-5.2; reasoning_effort: high; }
#review_gate { llm_provider: openai; }
Knowledge Management
OpenAI: "Table of Contents, Not Encyclopedia"
OpenAI learned early that massive AGENTS.md files fail predictably:
- Context is scarce—giant files crowd out task and code
- Too much guidance becomes non-guidance
- Monolithic manuals rot instantly
- Single blobs resist mechanical verification
Their solution: a short AGENTS.md (~100 lines) serves as a map to a structured docs/ directory treated as the system of record.
The philosophy: "What Codex can't see doesn't exist." Knowledge living in Google Docs, Slack threads, or people's heads is invisible to agents. Repository-local, versioned artifacts are the only accessible context.
StrongDM: Three Core Principles
StrongDM's knowledge framework centers on three interconnected principles:
1. Seed → Validation → Feedback Loop
Start with PRD, spec, screenshot, or existing code. Loop runs until holdout scenarios pass and stay passing.
2. End-to-End Validation Harness
"As close to real environment as possible: customers, integrations, economics."
3. Tokens as Fuel for Problem-Solving
Convert obstacles into model-consumable representations. "Creative, frontier engineering" transforms traces, screenshots, transcripts, incident replays, surveys, and customer interviews into formats agents understand.
Six Techniques for Agent-Driven Development
StrongDM documents six repeatable patterns:
- Digital Twin Universe (DTU): Clone external dependency behaviors
- Gene Transfusion: Move working patterns between codebases via concrete exemplars
- The Filesystem: Use filesystem as agent working memory
- Shift Work: Separate interactive work from fully specified execution
- Semport: Semantic-aware automated ports (cross-language/framework)
- Pyramid Summaries: Reversible hierarchical summarization at multiple zoom levels
Autonomy Levels & Human Role
OpenAI: End-to-End Feature Delivery
OpenAI recently crossed a threshold where Codex can drive features end-to-end:
- Validate current codebase state
- Reproduce bug + record video demonstration
- Implement fix
- Validate fix by driving app + record video
- Open pull request
- Respond to agent and human feedback
- Detect and remediate build failures
- Escalate to human only when judgment required
- Merge the change
Important caveat: "This behavior depends heavily on the specific structure and tooling of this repository and should not be assumed to generalize without similar investment."
Human Role at OpenAI
Engineers prioritize work, translate user feedback to acceptance criteria, and validate outcomes. When agents struggle, humans identify missing tools, guardrails, or documentation. The job transforms into designing environments, feedback loops, and control systems.
StrongDM: Non-Interactive from Specification
StrongDM's "Shift Work" model separates interactive work (writing specs and scenarios) from fully specified work (agent execution). Once specifications are complete, Attractor runs end-to-end without human iteration.
Human-in-the-Loop Gates
When human judgment is required, hexagon nodes in the Attractor graph trigger multiple-choice questions. StrongDM provides four interviewer implementations:
- AutoApproveInterviewer: Always selects first option (for testing)
- ConsoleInterviewer: Terminal-based prompts
- CallbackInterviewer: Delegates to external APIs/webhooks
- QueueInterviewer: Pre-filled answers for deterministic testing
Human Role at StrongDM
Engineers define intent, scenarios, and constraints. The focus becomes "creative, frontier engineering"—making diverse data types (traces, screenshots, incident replays, customer interviews) consumable by models. Engineers design fully autonomous systems, then step back.
Technical Debt Management
OpenAI: "Garbage Collection"
OpenAI initially spent every Friday—20% of the week—manually cleaning up "AI slop." This didn't scale.
Their solution: "Golden principles"—opinionated mechanical rules keeping the codebase legible and consistent:
- Prefer shared utility packages over hand-rolled helpers
- Don't probe data "YOLO-style"—validate boundaries or use typed SDKs
Background Codex tasks now scan for deviations on a regular cadence, update quality grades, and open targeted refactoring PRs. Most are reviewed in under a minute and automerged.
StrongDM: Continuous Validation
StrongDM's approach to technical debt is less explicitly documented but implied through their continuous validation feedback loops. Scenarios prevent accumulation of bad patterns by catching regressions immediately.
Economic Models
OpenAI: Velocity Multiplication
- 1/10th the time vs manual coding
- No specific cost metrics disclosed
- Practical production deployment focus
- Shipping agent harness as product (Codex App Server)
- Deployed in JetBrains, Xcode, Codex desktop
StrongDM: Economic Transformation
- $1,000/day per engineer on tokens (minimum)
- Tasks previously economically infeasible now routine
- Building full SaaS behavioral clones now viable
- Heavy compute investment model
- "Deliberate naivete" removes outdated constraints
OpenAI optimizes for practical velocity gains within existing software engineering budgets. StrongDM advocates for radical economic reframing—spend orders of magnitude more on compute to eliminate human involvement entirely.
Open Source & Community
OpenAI: Proprietary with Partner Integration
OpenAI's approach remains proprietary. They built an internal product and shared learnings via blog post, but haven't open-sourced the implementation. Instead, they're shipping the Codex App Server to partners—embedded in JetBrains, Xcode, and the Codex desktop app.
StrongDM: Open Specification Strategy
StrongDM took the opposite approach: open-source the entire specification as "NLSpec" (Natural Language Specifications).
The strongdm/attractor GitHub repository contains just three markdown files—complete specifications for building Attractor. The implementation approach? Prompt a coding agent:
Building Attractor
"Implement Attractor as described by https://factory.strongdm.ai/"
The repository has 506 stars, 65 forks, and community implementations are already emerging (including brynary/attractor in TypeScript).
What They're Still Learning
OpenAI's Explicit Unknowns
OpenAI acknowledges several open questions:
- How does architectural coherence evolve over years in a fully agent-generated system?
- Where does human judgment add the most leverage?
- How will the system evolve as models continue improving?
- How to encode judgment so it compounds over time?
StrongDM's Presentation
StrongDM's documentation presents their approach as "field-tested" with less discussion of remaining unknowns. They position the breakthrough as already achieved rather than ongoing exploration.
The Verdict: Which Approach Wins?
OpenAI's Approach: Immediately Practical
- Concrete metrics (1M LOC, 1,500 PRs, 3.5 PRs/engineer/day)
- Real daily users and production deployment
- Documented challenges and remaining unknowns
- Works with existing tools and workflows
- Explicit about environment-specific optimizations
StrongDM's Approach: Architecturally Innovative
- Open source specifications enabling community implementations
- Novel testing paradigms (scenarios, satisfaction metrics, DTU)
- Deterministic reproducibility designed in from the start
- Multi-provider flexibility built natively
- Graph-based orchestration enables complex workflows
The Philosophical Divide
| Dimension | OpenAI | StrongDM |
|---|---|---|
| Human Role | Partnership - humans and agents collaborate | Autonomy - agents operate independently |
| Success Factor | Environment design with legibility and constraints | Scenarios, validation harnesses, self-correction |
| Economic Model | Velocity multiplication (1/10th time) | Economic transformation (tasks impossible → routine) |
| Strategy | Evolution - adapt traditional engineering | Revolution - remove Software 1.0 conventions |
| Maturity | Pragmatic shipping in production | Radical vision with open specifications |
The Synthesis
These aren't competing approaches—they're complementary visions operating at different time horizons.
OpenAI optimizes for human+AI productivity within familiar software engineering paradigms. Their approach works today, ships to production, and delivers measurable velocity gains. The constraints they've built—layered architecture, mechanical linting, progressive disclosure documentation—show what's immediately practical for teams wanting to adopt agent-driven development.
StrongDM reimagines software engineering entirely around autonomous AI systems. Their innovations—Digital Twin Universe, satisfaction metrics, graph-based orchestration, deterministic reproducibility—point toward a future where humans design systems that agents operate without supervision. The economic reframing ($1k/day on tokens) signals that optimization criteria have fundamentally shifted.
OpenAI keeps humans in the loop because current models still require careful environment design to compound correctness. StrongDM removes humans from the loop because they believe we've crossed—or are about to cross—the threshold where agents can maintain quality autonomously.
The question isn't which approach is "right." The question is which constraints you're optimizing for: shipping products with proven patterns today (OpenAI) or building the infrastructure for fully autonomous development tomorrow (StrongDM).
Both answers are valid. Both are probably necessary. And both will continue evolving as models improve over the next 6-12 months.