“Give me six hours to chop down a tree and I will spend the first four sharpening the axe.” — Abraham Lincoln

The Challenge of Scale in Compiler Fixing

In our previous post, we introduced the Verilator Gap Checker—an automated framework that hunts for semantic gaps between Verilator and the IEEE 1800-2023 standard. The framework proved highly effective, generating over 900 tests and surfacing 121 legitimate issues within two months.

However, discovering a bug is only the first step. Fixing a semantic gap in a complex C++ compiler like Verilator is a resource-intensive process. A typical manual repair cycle requires peeling back layers of SystemVerilog codebase to isolate a minimal reproducible example, cross-referencing IEEE documentation, parsing massive AST dumps (via verilator --dump-tree-json) to locate the exact V3 compiler pass at fault, editing the C++ source, running local regressions, and finally navigating the upstream code review process.

For a single engineer, addressing 121 issues manually is not a scalable workflow. To clear this queue efficiently without compromising the strict quality standards of the upstream repository, we required a systemic approach. This post details the architecture of verilator_developer, a fully automated, 9-stage agentic pipeline designed to scale complex C++ compiler contributions.

The Core Principle: Goal-Oriented (GO) Execution

Before detailing the pipeline, it is necessary to establish the underlying operational framework. The architecture relies on a Goal-Oriented (GO) execution model, which safely delegates trial-and-error loops to the agent. This model requires three strict preconditions:

  1. Bounded Environment: The agent operates exclusively within scoped, isolated git worktrees. It cannot access or modify external host systems.
  2. Well-Defined Goals: Objectives are deterministic. This is the sole prerequisite for the agent to know when to exit the execution loop and deliver the result to the user.
  3. Machine-Verifiable Criteria: Success is not subjective. It is defined by binary outcomes: the code compiles, the test passes on both Verilator and standard simulators (e.g., Questa), and coverage metrics are met.

When these three conditions hold, the agent can autonomously grind through deep, narrow problem spaces, allowing human engineers to step back from the iterative development loop.

The 9-Stage QA Pipeline

The verilator_developer pipeline wraps the entire lifecycle of a bug fix, from initial issue assignment to a merged upstream Pull Request.

Phase 1: Context Investigation & Repro Isolation

The pipeline begins by isolating the semantic failure flagged by the Gap Checker into a minimal reproducible test case. Guided by a systematic-debugging discipline, the agent runs this failing test to capture the compiler’s behavioral anomaly. It then cross-references the official IEEE 1800-2023 specifications, parses the JSON AST dumps (via verilator --dump-tree-json), and produces a comprehensive tracking brief (brief_<ID>.md) that maps the isolated reproduction to the exact faulty compiler pass and node.

Phase 2: Isolated Environment Provisioning

To enforce strict environment boundaries, the pipeline provisions a clean, isolated git worktree for the targeted issue. It configures the local build flags, maps the dependencies, and registers the session hooks. This stateful isolation ensures that multiple parallel agent executions never pollute the master repository or cross-contaminate code changes.

Phase 3: Pre-Flight & Core Implementation

Before writing any code, a deterministic pre-flight check validates the baseline compiler state. Once cleared, the agent enters a test-driven development (TDD) implementation loop. It modifies the C++ source files (e.g., within V3Unknown.cpp) to correct the logic defect, running an iterative build-test loop until the minimal reproducible example compiles and passes against the golden reference simulator (Questa).

Phase 4: Matrix Testing Expansion

A minimal repro only proves a specific case is fixed. A compiler requires robust handling of edge cases. In this phase, the agent autonomously expands the minimal repro into a comprehensive test matrix (handling single-field structs, nested arrays, null inputs, etc.). Each test must fail on the Verilator original master branch, pass on the Verilator patched branch, and pass on a golden simulator, like Questa simulator.

Phase 5: CI Regression Gating

The patched branch is pushed to a remote fork to trigger the full Verilator CI regression suite (over 4,000 tests). This is a hard gate: zero regressions are permitted. Any failure triggers a triage and a potential rollback to Phase 3.

Phase 6: Context-Isolated Code Review

To prevent AI confirmation bias, the pipeline spawns a completely isolated sub-agent. This reviewer agent is only provided with the unified diff and a maintainer checklist—it has no knowledge of the original agent’s rationale. It inspects the code for invariant breaks, dead code, and logical flaws.

Phase 7: Coverage Gating

A coverage build is triggered. The pipeline enforces a strict 100% line coverage requirement for the modified patch. While 100% branch coverage is not strictly required, any unreached branches must be explicitly justified or refactored.

Phase 8: Maintainer Simulation via Knowledge Distillation

To ensure the patch meets the upstream project’s specific conventions, the pipeline utilizes Knowledge Distillation. By ingesting the past 10 years of Verilator’s public PR history, we distilled a contextual guide of recurring technical concerns and coding styles (e.g., appending p to pointers, using m_ prefixes). The Phase 8 agent acts as a simulated upstream maintainer, pushing back on formatting, commit boundaries, and PR descriptions before the patch is ever made public.

Phase 9: Human-In-The-Loop (HITL) Sign-off

The final phase is reserved for a human engineer. Instead of writing code, the engineer reviews the generated reports, inspects the commit boundaries (ensuring large patches are logically chunked), and approves the final PR description.

Asynchronous Orchestration: The Supervisor Architecture

As the throughput of verilator_developer increased, manual multi-tasking became a bottleneck. Running multiple concurrent git worktrees led to high context-switching overhead for the human engineer—checking CI statuses, unblocking reviewers, and managing branch states.

To resolve this, we introduced a Supervisor Agent. Its role is strictly orchestration:

  • It dynamically assigns issues to 3-4 isolated, parallel worktrees.
  • It monitors the progression of each PR through the 9 phases.
  • Minimized Interruption Rate: We integrated a mobile-dispatch interface. Instead of requiring the engineer to monitor a terminal, the Supervisor batches necessary decisions (e.g., “Approve commit split for PR #5455”) and routes them to a mobile device.

This architectural decision is based on a core engineering principle: the effectiveness of an AI-automated system is inversely proportional to its human interruption rate. By batching micro-decisions, we maximize the engineer’s focus on high-level architectural tasks.

Results and Metrics

From initial deployment to the time of writing (a 2.5-month window), this automated pipeline has yielded significant throughput improvements:

Value
Generated tests (from Gap Checker)~920
Unique semantic gaps surfaced121
Upstream merged PRs74
C++ code churn+18,000 / −3,500 lines
Average PR throughput multiplier~10× baseline
Peak monthly throughput~32 merged PRs / month (~16× baseline)
Compute cost€100 / month (Claude subscription)

Conclusion

The verilator_developer pipeline demonstrates that combining deterministic agentic frameworks with rigorous, machine-verifiable QA gates can fundamentally alter the speed of open-source compiler development. By shifting the human engineer’s role from “manual implementation” to “architectural oversight,” we successfully scaled our contribution rate to the Verilator project without compromising code quality or upstream maintainer bandwidth.