MILESTONES

Sprint Log.

What shipped, when, and what it proved. Most recent first.

April 2026

ChipCraftBrain Paper Published on arXiv

Our research paper — ChipCraftBrain: Validation-First RTL Generation via Multi-Agent Orchestration — is now published on arXiv. It formalizes the architecture behind our benchmark results: a hybrid symbolic-neural system that orchestrates six specialized agents over a 168-dimensional state space, combining pattern retrieval, hierarchical decomposition, and adaptive reinforcement learning to solve RTL generation at production accuracy.

HIGHLIGHTS
  • Adaptive multi-agent orchestration over a 168-dimensional state space
  • Hybrid architecture: algorithmic solvers for logic, neural for timing and RTL
  • 321 pattern templates + 971 open-source implementations for targeted retrieval
  • Hierarchical decomposition with synchronized module interfaces
  • RISC-V case study: full validated hardware generation end-to-end
  • Zero fine-tuning — pure architectural advantage over trained competitors
METRICS
  • 98.72% on VerilogEval-Human (154/156 problems)
  • 94.7% on NVIDIA CVDP (286/302 problems)
→ Read the Report
March 2026

Local Testbench Model: 100% semantic accuracy, zero API cost

We trained a dedicated testbench LLM that replaces all cloud API calls for testbench generation. When it produces a testbench that compiles, it passes simulation 100% of the time — zero semantic failures. Compilation-only failures account for the remaining 7.3% and are recoverable through iteration. This drops per-run testbench cost to zero and unblocks unlimited-budget preference training for the RTL model. The model was fine-tuned on a curated dataset of ~150K verified entries. Verify-before-train discipline is why the model achieves zero semantic failures on compiled outputs: it learned from exclusively correct signal and can expand to novel specs.

HIGHLIGHTS
  • 100% simulation pass on compiled outputs
  • 92.7% generalization power
  • Replaces all cloud API calls for testbench generation — cost per run for testbenches drop to $0
  • Runs entirely on local GPU infrastructure
  • Enables unlimited-cost preference training for RTL model improvement

Verification is the moat. Every piece we bring in-house makes the loop tighter.

March 2026

CVDP: 54.24% pass rate improvement over SOTA baseline while requiring only 1.28 iterations on average

The previous SOTA relied on fine-tuned models, massive compute budgets, and hundreds of retries. ChipCraftBrain solves the same problems in 1.28 iterations on average — lifting the pass rate by 54.24 percentage points through a fundamentally different approach.

HIGHLIGHTS
  • 94.7% on NVIDIA CVDP (302 problems) — the hardest public RTL benchmark
  • Wins 3/4 shared categories vs ACE-RTL: Code Completion +12.75pp, Modification +5.49pp, Spec-to-RTL tied
  • 5 iterations vs 150 attempts — 30× more efficient
  • GPT-5 peaked at 60%. Their fine-tuned generator: 67%. ChipCraftBrain: 94.7%.
METRICS
  • 94.7% overall — #1 published result on CVDP
  • 96.4% Code Modification, 96.2% Spec-to-RTL, 93.6% Code Completion
  • 97.5% RTL Optimization — a category ACE-RTL didn't even attempt
  • Zero fine-tuning, zero custom training data — pure architectural advantage
→ Read the Report

We invite all other competitors to publish their scores on CVDP and similar benchmarks for fair evaluation of the systems.

February 2026

VerilogEval 98.7%, #1 on the Benchmark

ChipCraftBrain achieves 98.72% pass rate on the full 156-problem VerilogEval functional benchmark, surpassing every published and commercial system, including MAGE (95.9%), ChipAgents (97.4%, closed-source), VFlow (83.6%), and CodeV (59.2%).

HIGHLIGHTS
  • 154 of 156 problems passed via real testbench simulation
  • Beats MAGE (95.9%), the best published academic result
  • Beats ChipAgents (97.4%), the only commercial competitor, with unpublished methodology
  • Average 1.14 iterations per problem, near first-try accuracy
  • Full benchmark completed in 35.5 minutes
METRICS
  • 98.72% pass rate on VerilogEval (156 problems)
  • 1.14 average iterations to solution
  • 154/156 simulation pass rate (98.72%)
  • 2.72 percentage points above ChipAgents (97.4%)
  • 2.82 percentage points above MAGE (95.9%)