Verified results across benchmarks, pilots, and live deployments.
464
tasks verified
100%
pass@5 HumanEval
354
claims in LVR pilot
27
bugs auto-fixed
RLVF: more training data hurts. Verification must remain external.
LVR pilot: 126 files, 354 claims, 27 bugs fixed. Zero scaffolding.
Interested in