MILESTONES

Sprint Log.

What shipped, when, and what it proved. Most recent first.

April 2026

ChipCraftBrain Paper Published on arXiv

Our research paper — ChipCraftBrain: Validation-First RTL Generation via Multi-Agent Orchestration — is now published on arXiv. It formalizes the architecture behind our benchmark results: a hybrid symbolic-neural system that orchestrates six specialized agents over a 168-dimensional state space, combining pattern retrieval, hierarchical decomposition, and adaptive reinforcement learning to solve RTL generation at production accuracy.

HIGHLIGHTS

›Adaptive multi-agent orchestration over a 168-dimensional state space
›Hybrid architecture: algorithmic solvers for logic, neural for timing and RTL
›321 pattern templates + 971 open-source implementations for targeted retrieval
›Hierarchical decomposition with synchronized module interfaces
›RISC-V case study: full validated hardware generation end-to-end
›Zero fine-tuning — pure architectural advantage over trained competitors

METRICS

›98.72% on VerilogEval-Human (154/156 problems)
›94.7% on NVIDIA CVDP (286/302 problems)

→ Read the Report

March 2026

Local Testbench Model: 100% semantic accuracy, zero API cost

We trained a dedicated testbench LLM that replaces all cloud API calls for testbench generation. When it produces a testbench that compiles, it passes simulation 100% of the time — zero semantic failures. Compilation-only failures account for the remaining 7.3% and are recoverable through iteration. This drops per-run testbench cost to zero and unblocks unlimited-budget preference training for the RTL model. The model was fine-tuned on a curated dataset of ~150K verified entries. Verify-before-train discipline is why the model achieves zero semantic failures on compiled outputs: it learned from exclusively correct signal and can expand to novel specs.

HIGHLIGHTS

›100% simulation pass on compiled outputs
›92.7% generalization power
›Replaces all cloud API calls for testbench generation — cost per run for testbenches drop to $0
›Runs entirely on local GPU infrastructure
›Enables unlimited-cost preference training for RTL model improvement

“Verification is the moat. Every piece we bring in-house makes the loop tighter.”

March 2026

CVDP: 54.24% pass rate improvement over SOTA baseline while requiring only 1.28 iterations on average

The previous SOTA relied on fine-tuned models, massive compute budgets, and hundreds of retries. ChipCraftBrain solves the same problems in 1.28 iterations on average — lifting the pass rate by 54.24 percentage points through a fundamentally different approach.

HIGHLIGHTS

›94.7% on NVIDIA CVDP (302 problems) — the hardest public RTL benchmark
›Wins 3/4 shared categories vs ACE-RTL: Code Completion +12.75pp, Modification +5.49pp, Spec-to-RTL tied
›5 iterations vs 150 attempts — 30× more efficient
›GPT-5 peaked at 60%. Their fine-tuned generator: 67%. ChipCraftBrain: 94.7%.

METRICS

›94.7% overall — #1 published result on CVDP
›96.4% Code Modification, 96.2% Spec-to-RTL, 93.6% Code Completion
›97.5% RTL Optimization — a category ACE-RTL didn't even attempt
›Zero fine-tuning, zero custom training data — pure architectural advantage

→ Read the Report

“We invite all other competitors to publish their scores on CVDP and similar benchmarks for fair evaluation of the systems.”

February 2026

VerilogEval 98.7%, #1 on the Benchmark

ChipCraftBrain achieves 98.72% pass rate on the full 156-problem VerilogEval functional benchmark, surpassing every published and commercial system, including MAGE (95.9%), ChipAgents (97.4%, closed-source), VFlow (83.6%), and CodeV (59.2%).

HIGHLIGHTS

›154 of 156 problems passed via real testbench simulation
›Beats MAGE (95.9%), the best published academic result
›Beats ChipAgents (97.4%), the only commercial competitor, with unpublished methodology
›Average 1.14 iterations per problem, near first-try accuracy
›Full benchmark completed in 35.5 minutes

METRICS

›98.72% pass rate on VerilogEval (156 problems)
›1.14 average iterations to solution
›154/156 simulation pass rate (98.72%)
›2.72 percentage points above ChipAgents (97.4%)
›2.82 percentage points above MAGE (95.9%)