How does NSPEC verify bugs?

An independent bug-verifier agent re-runs the repro in a fresh browser context, up to three times. Only bugs that reproduce, with a manual-grade confidence score, make it into the report.

Do you need access to my source code?

No. NSPEC tests the running UI. You give it a URL and optional login. It never reads your repo unless you opt in to git-diff based risk prioritization.

Which viewports are covered?

Six viewports at launch: desktop 1440, laptop 1280, tablet portrait and landscape, mobile portrait and landscape.

Yes, on Enterprise. Docker and Helm, with BYO LLM (OpenAI, Anthropic, or a local model). Artifacts never leave your network.

We ran NSPEC against its own landing page. Here's what broke.

A few hours after the marketing site went live, we pointed a standard-tier NSPEC run at nspec.dev. This is the part of the post where a founder is supposed to say the tool passed with flying colours. It didn’t. It found four real bugs, and every one of them was a thing I thought I had already checked.

Here’s what came back.

The run

Parameters: tier=standard, six viewports (desktop 1440, laptop 1280, tablet portrait + landscape, mobile portrait + landscape), no login, accessibility category skipped. Wall clock time: roughly twelve minutes, most of it spent in the verifier.

The bugs

P1 · Mobile menu close button unclickable. The hamburger opened the menu fine. The X to close it sat inside the header (z-50), but the backdrop overlay was also z-40 and covered the full viewport including the header area. The click went to the backdrop, not the button. I had tested opening the menu ten times. I had never once closed it without reloading.
P1 · “Reserve my seat” on the Team tier was dead. The popular pricing tier had a button with anonClick that pointed to a stale handler. It rendered. It hovered. It did nothing. This is the highest-intent CTA on the entire page.
P2 · “Talk to us” on Enterprise was also dead. Same pattern, different tier.
P2 · Eight footer links were href="#"stubs.Privacy, Terms, Blog, Changelog, and the social icons pointed nowhere. The individual verdict on each was mild; the collective signal was “unfinished project.”

What the verifier did

The thing that matters more than the bugs is what didn’t make it into the report. The first pass through the agents generated roughly forty candidates. After the verifier and the server-side quality gates, four survived. The rest were:

Third-party console noise (font 404s from a CDN that had a bad minute, a rate-limited telemetry envelope, a browser-extension injection).
Layout warnings that reproduced in one browser-context-state but disappeared in the fresh verifier context.
Duplicate screenshots from two agents covering the same surface.
One bug candidate from ui-explorer that bug-verifiercould not reproduce across three attempts and marked as “low” confidence. Kicked.

Without the verifier, the report would have been 40 items and the four real bugs would have been buried in the noise. With it, the ticket count equals the bug count.

What filing looked like

Each survivor went into a local tracker fixture over the MCP connector path (we don’t route real tickets to our internal Linear during self-test runs). The ticket body contained the exact shape an external tracker receives at launch: title, numbered repro, acceptance criteria, severity with rationale, and the evidence bundle (highlighted.png, fullpage.png, dom.html, console.log, network.json, verifier.json).

The fix turnaround

All four shipped inside four hours of the run. The mobile menu fix was three lines (z-[60] on the button container, top-16 on the overlay so it starts below the header). The pricing CTAs became a discriminated union on tier kind so Enterprise routes to a prefilled mailto and waitlist tiers open the inline form. The footer stubs became real routes.

The point

The point of dogfooding is not that the tool didn’t find bugs. The point is that a pre-launch marketing site, written by the same person who built the tool that tests it, still had two P1s and two P2s the operator hadn’t caught. The verifier pattern is the reason those four ended up in the ticket queue instead of the trash folder with the other thirty-six.

If you want to join the first cohort, the form is on the home page. If you want to argue about the architecture, the next post is for you.