The Problem

Verilator is the fastest open-source SystemVerilog simulator — 10–100× faster than commercial tools in many scenarios. But it has a well-known gap: incomplete support for the IEEE 1800-2017 SystemVerilog standard.

Your UVM testbench runs fine on QuestaSim, you migrate it to Verilator, and things break.
Which language features work?
Which don’t?
The only way to find out is to write feature tests one by one. For 269 features in a single IEEE chapter, that’s at least a week of manual work.

We built Verilator Gap Checker to automate this: AI generates the tests, runs them on Verilator and a commercial simulator (Questa in our case), detects the gaps, and files GitHub issues to our Verilator feature CI tests repository — end to end.

This project is part of DI-OSVISE, funded by BMBF, with the goal of enabling UVM testbenches to run on open-source simulators.

System Requirements

To run Verilator Gap Checker, you need:

  • QuestaSim (or another IEEE 1800-compliant commercial simulator) — serves as the golden reference
  • Verilator — the open-source simulator under test
  • Claude API access — powers the AI agents for code generation, analysis, and issue writing
  • GitHub CLI (gh) — for searching existing issues and publishing new ones

The main dependency is a commercial simulator license. We use QuestaSim because it fully implements IEEE 1800 and is widely recognized as a reference simulator, but in principle any compliant commercial tool could work.

System Architecture

The tool is a multi-agent system. A Supervisor Agent orchestrates three phases, each handled by specialized AI
agents with distinct roles.

Fig.1 System Architecture

Core Principle: Dual-Simulator Comparison

The detection logic is simple: run the same test on both QuestaSim and Verilator, then compare.

QuestaSimVerilatorConclusion
PASSFAILGap — Verilator doesn’t support this
PASSPASSSupported
FAILFAILTest issue — the test itself is broken

Why not verify against the IEEE standard directly? The standard is natural-language text — hard to validate automatically. And tests can have bugs too, so we need a known-good reference to catch them. QuestaSim fits the bill.

Phase 1: Test Generation & Gap Detection

Fig.2 Phase 1: test generation & gap detection

The Code Gen Agent reads a YAML feature definition — a simple list describing which SystemVerilog features to test, with references to the IEEE standard sections (see Input: Just YAML below for an example). From this, it generates self-checking SystemVerilog test code. The test runs on QuestaSim first. If it fails, the Code Fix Agent reads the error log, patches the code, and retries — up to 3 times. If all retries fail, the test is skipped.

Once QuestaSim passes, the same test runs on Verilator. If Verilator fails, a Gap Analysis Agent classifies the failure (syntax error, unsupported feature, behavioral bug, etc.) and saves a structured report.

Phase 2: Gap Review & Deduplication

Fig.3 Phase2: gap review & deduplications

Phase 1 is mechanical: if QuestaSim passes and Verilator fails, it’s flagged as a gap. But not every Verilator failure is a feature gap — it could be a misconfigured Makefile, an intentional design tradeoff documented by Verilator, or a warning promoted to an error. Phase 2 is where the AI actually thinks about whether this failure matters.

The Gap Review Agent searches the Verilator GitHub for existing issues and PRs, checks documentation, and makes a call:

  • APPROVED — genuine, unreported gap → goes to Phase 3
  • DUPLICATE — already reported or being worked on → skipped
  • REJECTED — not a real gap (test issue, intentional behavior) → discarded

Phase 3: Issue Creation & Publishing

Fig.4 Phase 3: issue creation & publishing

The Issue Creator Agent first merges gaps that share the same root cause — ten constraint tests failing because of missing solve...before support become one issue, not ten. Each issue includes runnable test code, error logs, and IEEE standard references. Before publishing, a dry-run preview is generated for human review. If something needs adjustment, the reviewer can request revisions and the agent regenerates. Only after manual approval is the issue actually published to GitHub.

Between Phase 2 (filtering out known community issues) and Phase 3 (merging related failures), the two-level deduplication keeps the final output clean.

Input: Just YAML

Users don’t need to write test code — they just describe the features to test in YAML:

An example of yaml file as input:

domain: "Randomization"
ieee_chapter: 18

features:
  - name: "rand_basic"
    description: "basic rand variable declaration and randomization"
    ieee_section: "18.4"

  - name: "constraint_inside"
    description: "inside operator in constraints (x inside {[1:10]})"
    ieee_section: "18.5.3"

  - name: "soft_constraint"
    description: "soft constraints with weighted priority"
    ieee_section: "18.5.13"
    note: "soft constraints should be overridable by hard constraints"

Every test traces back to a specific IEEE section, and files are organized by chapter.

In practice, we even had AI generate the YAML itself — it reads the relevant IEEE chapter and produces the feature
list. So the entire pipeline from “reading the standard” to “filing issues” is AI-driven.

Results

We ran Gap Checker on IEEE 1800-2017 Chapter 18 (Randomization): 269 test cases covering sections 18.3–18.17.

What the traditional workflow looks like: read the standard, hand-write 2–3 tests per feature, debug them in QuestaSim, re-run in Verilator, analyze failures, write up reports, file issues manually. Half an hour to several hours per feature.

What the AI told me now:

Input:
  "Test all Randomization features in IEEE Chapter 18.
   Generate YAML by section, cover multiple cases per feature,
   then start the Gap Checker pipeline."

Output:
269 tests generated and executed
85 gaps detecteddeduplicated to 34 independent issues
Issues published to GitHub with full test reports

To be clear: we didn’t just take the AI’s word for it. Every one of the 34 issues went through manual review before publishing. At this stage of development, human oversight is essential — we verified that each reported gap was a genuine Verilator limitation, not a test bug or misconfiguration. The automation handles the bulk of the work; the human makes the final call.

Breakdown:

StatusCount%
Supported17866%
Gap8532%
Test Issue62%

Of the 85 gaps: 41% were unsupported features, 33% were behavioral bugs (compiles fine, wrong output), 9% partial support, 17% unclear.

The 60% dedup rate is notable — many seemingly different failures trace back to the same root cause.

Cost: roughly $20–$30 in AI usage for this entire Chapter 18 test campaign (Claude subscription, not API — so this is a rough estimate). The equivalent manual effort at $30/hr would be around $4,000. That’s about 200× cheaper. The tradeoff is that AI-generated tests aren’t always as precise as hand-crafted ones, but as a first-pass screening tool across hundreds of features, the ROI is hard to argue with. And in some cases, AI actually catches corner cases that a human would skip over.

Limitations and Future Work

What we’d improve:

  • Model allocation: Right now everything runs on Claude Opus 4.5. Simpler tasks (e.g., YAML generation, basic classification) could use lighter models like Sonnet or Haiku to cut costs further. We also want to explore whether the free tier or open-weight models (e.g., via http://Kilo.ai or similar platforms) can handle parts of the pipeline — this would lower the barrier to entry significantly.
  • Coverage: We’ve only done Chapter 18 so far. IEEE 1800 has a lot more ground to cover — Classes (Ch8), Interfaces (Ch25), Assertions (Ch16), and beyond.

What’s next:

  • Expand to more chapters
  • Automated regression on new Verilator releases to track which gaps get fixed over time
  • Test with open-source / open-weight LLMs to reduce dependence on commercial APIs

Beyond Verilator: The approach here isn’t specific to Verilator. Any tool that has a golden reference and a specification can be tested the same way — compare behavior against the reference, flag deviations, and generate reports. This could apply to other EDA tools, compilers, or any standards-compliant software.

Conclusion

Verilator Gap Checker is a straightforward idea — AI-generated tests, dual-simulator comparison, automated issue filing — but it works. 269 tests, 34 issues, a fraction of the manual cost.

It’s not a replacement for careful human verification. It’s a screening pass that gets the bulk of the work done so engineers can focus on the hard problems.

This project is developed by PlanV GmbH as part of DI-OSVISE. We’ve been contributing to Verilator for a while — bug fixes, feature work, UVM compatibility improvements — and this tool is part of that effort. We’re planning to open-source it so other teams can use it too.