Tech Report
February 2026

The Agent-Driven Development Wars: OpenAI vs StrongDM

Two tech giants are racing to eliminate human coding. Their approaches couldn't be more different.

February 2026

In the span of six months, two major players have emerged with radically different visions for how AI agents will write software. OpenAI's approach keeps humans firmly in the driver's seat, steering autonomous executors. StrongDM's approach removes humans from the code entirely, declaring "code must not be written by humans." The results? Both claim transformative success—but through fundamentally incompatible philosophies.

Origins & Timeline

OpenAI Harness Engineering

  • Started: Late August 2025
  • Duration: 5 months of active development
  • Team: 3 engineers → 7 engineers
  • Status: Internal beta with hundreds of daily users
  • Tech: Codex CLI powered by GPT-5

StrongDM Factory

  • Founded: July 14, 2025
  • Founders: Justin McCarthy (CTO), Jay Taylor, Navan Chauhan
  • Catalyst: Claude 3.5 Sonnet (Oct 2024)
  • Status: Open-sourced specs (506 GitHub stars)
  • Published: February 6, 2026

Both projects emerged within weeks of each other in mid-2025, responding to the same inflection point: AI models had finally crossed the threshold where they could compound correctness rather than compound errors over long coding sessions.

Core Philosophy

"Humans steer. Agents execute." — OpenAI
vs
"Code must not be written by humans." — StrongDM

OpenAI: The Partnership Model

OpenAI's philosophy centers on collaboration. Their mantra—"Humans steer. Agents execute"—positions engineers as architects of environments rather than writers of code. The human role transforms into designing scaffolding, specifying intent, and building feedback loops that enable Codex agents to work reliably.

Key Insight: When an agent struggles at OpenAI, the fix is never "try harder." It's always "what capability is missing?" Engineers work depth-first, breaking goals into building blocks and prompting agents to construct them.

Engineers at OpenAI still review pull requests (though it's optional), validate outcomes, and translate user feedback into acceptance criteria. They maintain strategic control while delegating tactical execution to autonomous agents.

StrongDM: The Autonomy Model

StrongDM's approach is more radical. They operate under three founding rules:

StrongDM's Three Rules

1. Code must not be written by humans

2. Code must not be reviewed by humans

3. Spend minimum $1,000/day per engineer on tokens

Their guiding question when approaching any task: "Why am I doing this? (implied: the model should be doing this instead)." This principle of "deliberate naivete" aims to remove outdated "Software 1.0" conventions entirely.

StrongDM's vision replaces code review with validation. Once specifications are complete, agents execute end-to-end without human iteration. The system validates against live-like environments and self-corrects dynamically—validation becomes superior to traditional code review for quality assurance.


Scale & Metrics

Metric OpenAI StrongDM
Lines of Code ~1 million Not disclosed
Pull Requests ~1,500 merged Not disclosed
Throughput 3.5 PRs per engineer per day Thousands of scenarios per hour
Speed Improvement 1/10th the time vs manual coding Not quantified
Agent Runtime 6+ hours continuous work (overnight) Batch-oriented shift work
Cost Model Not disclosed $1,000/day per engineer minimum

OpenAI emphasizes practical velocity gains—building in one-tenth the time of traditional development. StrongDM emphasizes economic transformation—tasks previously impossible are now routine, justifying heavy token investment.


Testing & Validation Philosophy

Perhaps nowhere is the philosophical divide more apparent than in testing approaches.

OpenAI: Application Legibility

OpenAI's bottleneck became human QA capacity. Their solution: make everything—UI, logs, metrics—directly legible to Codex.

OpenAI's Observability Stack (Per Git Worktree)

Chrome DevTools Protocol: DOM snapshots, screenshots, navigation

Victoria Logs: LogQL queries for log analysis

Victoria Metrics: PromQL queries for performance metrics

Victoria Traces: TraceQL queries for distributed tracing

Vector: Telemetry collection and routing

The app is bootable per git worktree, so Codex can launch and drive one instance per change. Agents can reproduce bugs, validate fixes, and reason about UI behavior directly. Prompts like "ensure service startup completes in under 800ms" become tractable because agents query the observability stack themselves.

StrongDM: Scenarios Over Tests

StrongDM identified a fundamental problem with traditional testing: agents learn to game the tests. Their solution: scenarios—end-to-end user stories stored outside the codebase, functioning as "holdout datasets."

Key Innovation: Instead of boolean pass/fail, StrongDM uses "satisfaction metrics"—probabilistic validation that asks "what fraction of observed trajectories through all scenarios likely satisfy the user?"

The Digital Twin Universe (DTU)

StrongDM's most striking innovation is the Digital Twin Universe: behavioral clones of third-party services including Okta, Jira, Slack, Google Docs, Google Drive, and Google Sheets.

The DTU enables:

  • Testing at volumes "far exceeding production limits"
  • Safely testing dangerous failure modes
  • Thousands of scenarios per hour with no rate limits
  • No API costs or abuse detection triggers
  • Fully reproducible, deterministic test conditions

Building high-fidelity service clones was previously economically unfeasible. StrongDM argues that automation has transformed it into routine operation—exemplifying their "deliberate naivete" principle.


Technical Architecture

OpenAI: Layered Domain Architecture

OpenAI enforces a rigid architectural model where each business domain divides into fixed layers with strictly validated dependency directions.

Types → Config → Repo → Service → Runtime → UI ↓ Providers (auth, telemetry, feature flags, etc) ↓ App Wiring + UI

Code can only depend "forward" through layers. Cross-cutting concerns enter through a single explicit interface: Providers. Custom linters (written by Codex) enforce these rules mechanically, with error messages containing remediation instructions for agents.

Philosophy: "Enforce boundaries centrally, allow autonomy locally." The constraints are what enable speed without architectural drift—you care deeply about boundaries but give agents significant freedom in how solutions are expressed.

OpenAI sometimes reimplements dependencies for agent legibility. Rather than pulling in a generic concurrency package, they built their own map-with-concurrency helper: tightly integrated with OpenTelemetry instrumentation, 100% test coverage, and exactly the behavior their runtime expects.

StrongDM: Graph-Based Pipeline (Attractor)

StrongDM's architecture centers on Attractor, a DOT-based pipeline orchestration system that represents multi-stage AI workflows as directed graphs.

PARSE → VALIDATE → INITIALIZE → EXECUTE → FINALIZE

Workflows are defined in Graphviz DOT syntax, providing immediate visualization and version control compatibility. Node types map to handler implementations via shape attributes:

Attractor Node Types

Mdiamond (start) — Entry point

Msquare (exit) — Terminal with goal-gate enforcement

box (default) — LLM task invocation

hexagon — Human-in-the-loop gate

diamond — Conditional routing

component — Parallel fan-out

tripleoctagon — Parallel fan-in

parallelogram — External tool execution

house — Supervisor loop

Deterministic Edge Selection

After each node completes, the engine selects the next edge using a 5-step priority hierarchy:

  1. Condition-matching edges (evaluate boolean expressions)
  2. Preferred label match (outcome suggests specific edge)
  3. Suggested next IDs (outcome recommends target nodes)
  4. Highest weight (numeric priority attribute)
  5. Lexical tiebreak (alphabetical node ID ordering)

This ensures reproducible routing without runtime randomness—critical for debugging and audit trails.

State Management

Attractor maintains three levels of state:

  • Context: Thread-safe key-value store shared across pipeline stages
  • Checkpoints: Serialized execution state enabling crash recovery and exact resume
  • Artifacts: Large outputs (>100KB) are file-backed for efficiency

Multi-Provider Flexibility

StrongDM embraces provider-specific tooling rather than forcing normalization:

  • OpenAI models use apply_patch format
  • Anthropic models use edit_file with exact-string matching
  • Gemini models use their specific conventions

A CSS-like "Model Stylesheet" centralizes LLM configuration:

* { llm_model: claude-sonnet-4-5; }
.critical { llm_model: gpt-5.2; reasoning_effort: high; }
#review_gate { llm_provider: openai; }

Knowledge Management

OpenAI: "Table of Contents, Not Encyclopedia"

OpenAI learned early that massive AGENTS.md files fail predictably:

  • Context is scarce—giant files crowd out task and code
  • Too much guidance becomes non-guidance
  • Monolithic manuals rot instantly
  • Single blobs resist mechanical verification

Their solution: a short AGENTS.md (~100 lines) serves as a map to a structured docs/ directory treated as the system of record.

AGENTS.md (100 lines) — the map ARCHITECTURE.md docs/ ├── design-docs/ (index, core-beliefs) ├── exec-plans/ (active/, completed/, tech-debt) ├── generated/ (db-schema) ├── product-specs/ (with index) ├── references/ (design-system, nixpacks, uv) └── DESIGN.md, FRONTEND.md, PLANS.md, etc.

The philosophy: "What Codex can't see doesn't exist." Knowledge living in Google Docs, Slack threads, or people's heads is invisible to agents. Repository-local, versioned artifacts are the only accessible context.

Progressive Disclosure: Agents start with a small, stable entry point and are taught where to look next, rather than being overwhelmed up front. Dedicated linters enforce that docs are up-to-date, cross-linked, and structured correctly. A recurring "doc-gardening" agent scans for staleness.

StrongDM: Three Core Principles

StrongDM's knowledge framework centers on three interconnected principles:

1. Seed → Validation → Feedback Loop

Start with PRD, spec, screenshot, or existing code. Loop runs until holdout scenarios pass and stay passing.

2. End-to-End Validation Harness

"As close to real environment as possible: customers, integrations, economics."

3. Tokens as Fuel for Problem-Solving

Convert obstacles into model-consumable representations. "Creative, frontier engineering" transforms traces, screenshots, transcripts, incident replays, surveys, and customer interviews into formats agents understand.

Six Techniques for Agent-Driven Development

StrongDM documents six repeatable patterns:

  1. Digital Twin Universe (DTU): Clone external dependency behaviors
  2. Gene Transfusion: Move working patterns between codebases via concrete exemplars
  3. The Filesystem: Use filesystem as agent working memory
  4. Shift Work: Separate interactive work from fully specified execution
  5. Semport: Semantic-aware automated ports (cross-language/framework)
  6. Pyramid Summaries: Reversible hierarchical summarization at multiple zoom levels

Autonomy Levels & Human Role

OpenAI: End-to-End Feature Delivery

OpenAI recently crossed a threshold where Codex can drive features end-to-end:

  1. Validate current codebase state
  2. Reproduce bug + record video demonstration
  3. Implement fix
  4. Validate fix by driving app + record video
  5. Open pull request
  6. Respond to agent and human feedback
  7. Detect and remediate build failures
  8. Escalate to human only when judgment required
  9. Merge the change

Important caveat: "This behavior depends heavily on the specific structure and tooling of this repository and should not be assumed to generalize without similar investment."

Human Role at OpenAI

Engineers prioritize work, translate user feedback to acceptance criteria, and validate outcomes. When agents struggle, humans identify missing tools, guardrails, or documentation. The job transforms into designing environments, feedback loops, and control systems.

StrongDM: Non-Interactive from Specification

StrongDM's "Shift Work" model separates interactive work (writing specs and scenarios) from fully specified work (agent execution). Once specifications are complete, Attractor runs end-to-end without human iteration.

Human-in-the-Loop Gates

When human judgment is required, hexagon nodes in the Attractor graph trigger multiple-choice questions. StrongDM provides four interviewer implementations:

  • AutoApproveInterviewer: Always selects first option (for testing)
  • ConsoleInterviewer: Terminal-based prompts
  • CallbackInterviewer: Delegates to external APIs/webhooks
  • QueueInterviewer: Pre-filled answers for deterministic testing

Human Role at StrongDM

Engineers define intent, scenarios, and constraints. The focus becomes "creative, frontier engineering"—making diverse data types (traces, screenshots, incident replays, customer interviews) consumable by models. Engineers design fully autonomous systems, then step back.


Technical Debt Management

OpenAI: "Garbage Collection"

OpenAI initially spent every Friday—20% of the week—manually cleaning up "AI slop." This didn't scale.

Their solution: "Golden principles"—opinionated mechanical rules keeping the codebase legible and consistent:

  • Prefer shared utility packages over hand-rolled helpers
  • Don't probe data "YOLO-style"—validate boundaries or use typed SDKs

Background Codex tasks now scan for deviations on a regular cadence, update quality grades, and open targeted refactoring PRs. Most are reviewed in under a minute and automerged.

"Technical debt is like a high-interest loan: it's almost always better to pay it down continuously in small increments than to let it compound and tackle it in painful bursts."

StrongDM: Continuous Validation

StrongDM's approach to technical debt is less explicitly documented but implied through their continuous validation feedback loops. Scenarios prevent accumulation of bad patterns by catching regressions immediately.


Economic Models

OpenAI: Velocity Multiplication

  • 1/10th the time vs manual coding
  • No specific cost metrics disclosed
  • Practical production deployment focus
  • Shipping agent harness as product (Codex App Server)
  • Deployed in JetBrains, Xcode, Codex desktop

StrongDM: Economic Transformation

  • $1,000/day per engineer on tokens (minimum)
  • Tasks previously economically infeasible now routine
  • Building full SaaS behavioral clones now viable
  • Heavy compute investment model
  • "Deliberate naivete" removes outdated constraints

OpenAI optimizes for practical velocity gains within existing software engineering budgets. StrongDM advocates for radical economic reframing—spend orders of magnitude more on compute to eliminate human involvement entirely.


Open Source & Community

OpenAI: Proprietary with Partner Integration

OpenAI's approach remains proprietary. They built an internal product and shared learnings via blog post, but haven't open-sourced the implementation. Instead, they're shipping the Codex App Server to partners—embedded in JetBrains, Xcode, and the Codex desktop app.

StrongDM: Open Specification Strategy

StrongDM took the opposite approach: open-source the entire specification as "NLSpec" (Natural Language Specifications).

The strongdm/attractor GitHub repository contains just three markdown files—complete specifications for building Attractor. The implementation approach? Prompt a coding agent:

Building Attractor

"Implement Attractor as described by https://factory.strongdm.ai/"

The repository has 506 stars, 65 forks, and community implementations are already emerging (including brynary/attractor in TypeScript).


What They're Still Learning

OpenAI's Explicit Unknowns

OpenAI acknowledges several open questions:

  • How does architectural coherence evolve over years in a fully agent-generated system?
  • Where does human judgment add the most leverage?
  • How will the system evolve as models continue improving?
  • How to encode judgment so it compounds over time?
OpenAI's Conclusion: "Our most difficult challenges now center on designing environments, feedback loops, and control systems that help agents accomplish our goal: build and maintain complex, reliable software at scale."

StrongDM's Presentation

StrongDM's documentation presents their approach as "field-tested" with less discussion of remaining unknowns. They position the breakthrough as already achieved rather than ongoing exploration.


The Verdict: Which Approach Wins?

OpenAI's Approach: Immediately Practical

  • Concrete metrics (1M LOC, 1,500 PRs, 3.5 PRs/engineer/day)
  • Real daily users and production deployment
  • Documented challenges and remaining unknowns
  • Works with existing tools and workflows
  • Explicit about environment-specific optimizations

StrongDM's Approach: Architecturally Innovative

  • Open source specifications enabling community implementations
  • Novel testing paradigms (scenarios, satisfaction metrics, DTU)
  • Deterministic reproducibility designed in from the start
  • Multi-provider flexibility built natively
  • Graph-based orchestration enables complex workflows
"OpenAI shows what's working now. StrongDM shows what's possible next."

The Philosophical Divide

Dimension OpenAI StrongDM
Human Role Partnership - humans and agents collaborate Autonomy - agents operate independently
Success Factor Environment design with legibility and constraints Scenarios, validation harnesses, self-correction
Economic Model Velocity multiplication (1/10th time) Economic transformation (tasks impossible → routine)
Strategy Evolution - adapt traditional engineering Revolution - remove Software 1.0 conventions
Maturity Pragmatic shipping in production Radical vision with open specifications

The Synthesis

These aren't competing approaches—they're complementary visions operating at different time horizons.

OpenAI optimizes for human+AI productivity within familiar software engineering paradigms. Their approach works today, ships to production, and delivers measurable velocity gains. The constraints they've built—layered architecture, mechanical linting, progressive disclosure documentation—show what's immediately practical for teams wanting to adopt agent-driven development.

StrongDM reimagines software engineering entirely around autonomous AI systems. Their innovations—Digital Twin Universe, satisfaction metrics, graph-based orchestration, deterministic reproducibility—point toward a future where humans design systems that agents operate without supervision. The economic reframing ($1k/day on tokens) signals that optimization criteria have fundamentally shifted.

OpenAI keeps humans in the loop because current models still require careful environment design to compound correctness. StrongDM removes humans from the loop because they believe we've crossed—or are about to cross—the threshold where agents can maintain quality autonomously.

The Bottom Line: Both approaches validate that the agent-driven development era has arrived. OpenAI shows enterprises how to adopt it pragmatically within existing structures. StrongDM shows startups and innovators what becomes possible when you rebuild from first principles.

The question isn't which approach is "right." The question is which constraints you're optimizing for: shipping products with proven patterns today (OpenAI) or building the infrastructure for fully autonomous development tomorrow (StrongDM).

Both answers are valid. Both are probably necessary. And both will continue evolving as models improve over the next 6-12 months.


Further Reading