Assay
Blog
Launch2026-02-22

We Verified Code from 4 AI Platforms. Average Score: 40/100

Four AI code generators. Four projects. One question: does the code actually do what it claims?

PlatformScoreBugsCritical
Replit44/10041
Bolt42/10062
Lovable42/10052
Claude35/10063

21 bugs total. 8 critical. None of the four passed a basic verification audit.

These aren't toy demos. Real apps, generated by the platforms people use every day. The code compiles. It runs. It just doesn't do what it says it does.

What went wrong

Every platform generated code with implicit claims — “this validates input,” “this handles auth,” “this sanitizes data.” Most of those claims were false.

Empty error handlers. Auth checks that don't check. Validation functions that validate nothing. The code looks right. It reads right. It ships to production. Then it breaks.

What Assay does

Assay extracts every implicit claim your code makes, then verifies each one against the actual implementation. Not “does it compile.” Not “does it pass lint.” Does it do what it says it does.

Try it now:

npx tryassay assess /path/to/project

It takes about 90 seconds. You get a score, a list of every claim that failed, and what to fix.

Why this matters

AI-generated code is already in production everywhere. The models are getting better at generating plausible code. They are not getting better at generating correct code.

Verification is not a training problem. It's an infrastructure problem. Layer 2 — external verification that sits below the model — is the fix.

Drop a repo link. I'll run it for free.

— Ty