Sprint Log.
What shipped, when, and what it proved. Most recent first.
ChipCraftBrain Paper Published on arXiv
Our research paper — ChipCraftBrain: Validation-First RTL Generation via Multi-Agent Orchestration — is now published on arXiv. It formalizes the architecture behind our benchmark results: a hybrid symbolic-neural system that orchestrates six specialized agents over a 168-dimensional state space, combining pattern retrieval, hierarchical decomposition, and adaptive reinforcement learning to solve RTL generation at production accuracy.
- ›Adaptive multi-agent orchestration over a 168-dimensional state space
- ›Hybrid architecture: algorithmic solvers for logic, neural for timing and RTL
- ›321 pattern templates + 971 open-source implementations for targeted retrieval
- ›Hierarchical decomposition with synchronized module interfaces
- ›RISC-V case study: full validated hardware generation end-to-end
- ›Zero fine-tuning — pure architectural advantage over trained competitors
- ›98.72% on VerilogEval-Human (154/156 problems)
- ›94.7% on NVIDIA CVDP (286/302 problems)
Local Testbench Model: 100% semantic accuracy, zero API cost
We trained a dedicated testbench LLM that replaces all cloud API calls for testbench generation. When it produces a testbench that compiles, it passes simulation 100% of the time — zero semantic failures. Compilation-only failures account for the remaining 7.3% and are recoverable through iteration. This drops per-run testbench cost to zero and unblocks unlimited-budget preference training for the RTL model. The model was fine-tuned on a curated dataset of ~150K verified entries. Verify-before-train discipline is why the model achieves zero semantic failures on compiled outputs: it learned from exclusively correct signal and can expand to novel specs.
- ›100% simulation pass on compiled outputs
- ›92.7% generalization power
- ›Replaces all cloud API calls for testbench generation — cost per run for testbenches drop to $0
- ›Runs entirely on local GPU infrastructure
- ›Enables unlimited-cost preference training for RTL model improvement
“Verification is the moat. Every piece we bring in-house makes the loop tighter.”
CVDP: 54.24% pass rate improvement over SOTA baseline while requiring only 1.28 iterations on average
The previous SOTA relied on fine-tuned models, massive compute budgets, and hundreds of retries. ChipCraftBrain solves the same problems in 1.28 iterations on average — lifting the pass rate by 54.24 percentage points through a fundamentally different approach.
- ›94.7% on NVIDIA CVDP (302 problems) — the hardest public RTL benchmark
- ›Wins 3/4 shared categories vs ACE-RTL: Code Completion +12.75pp, Modification +5.49pp, Spec-to-RTL tied
- ›5 iterations vs 150 attempts — 30× more efficient
- ›GPT-5 peaked at 60%. Their fine-tuned generator: 67%. ChipCraftBrain: 94.7%.
- ›94.7% overall — #1 published result on CVDP
- ›96.4% Code Modification, 96.2% Spec-to-RTL, 93.6% Code Completion
- ›97.5% RTL Optimization — a category ACE-RTL didn't even attempt
- ›Zero fine-tuning, zero custom training data — pure architectural advantage
“We invite all other competitors to publish their scores on CVDP and similar benchmarks for fair evaluation of the systems.”
VerilogEval 98.7%, #1 on the Benchmark
ChipCraftBrain achieves 98.72% pass rate on the full 156-problem VerilogEval functional benchmark, surpassing every published and commercial system, including MAGE (95.9%), ChipAgents (97.4%, closed-source), VFlow (83.6%), and CodeV (59.2%).
- ›154 of 156 problems passed via real testbench simulation
- ›Beats MAGE (95.9%), the best published academic result
- ›Beats ChipAgents (97.4%), the only commercial competitor, with unpublished methodology
- ›Average 1.14 iterations per problem, near first-try accuracy
- ›Full benchmark completed in 35.5 minutes
- ›98.72% pass rate on VerilogEval (156 problems)
- ›1.14 average iterations to solution
- ›154/156 simulation pass rate (98.72%)
- ›2.72 percentage points above ChipAgents (97.4%)
- ›2.82 percentage points above MAGE (95.9%)