How does NSPEC verify bugs?

An independent bug-verifier agent re-runs the repro in a fresh browser context, up to three times. Only bugs that reproduce, with a manual-grade confidence score, make it into the report.

Do you need access to my source code?

No. NSPEC tests the running UI. You give it a URL and optional login. It never reads your repo unless you opt in to git-diff based risk prioritization.

Which viewports are covered?

Six viewports at launch: desktop 1440, laptop 1280, tablet portrait and landscape, mobile portrait and landscape.

Yes, on Enterprise. Docker and Helm, with BYO LLM (OpenAI, Anthropic, or a local model). Artifacts never leave your network.

How NSPEC works · Autonomous frontend QA

The pipeline

One NSPEC run is a directed graph that looks like this:

URL in
  → discovery (routes, controls, forms)
  → orchestrator schedules N steps
  → agent network × 6 viewports  (parallel, isolated browser contexts)
  → 60+ tools: click, fill, scroll, assert, capture, probe, ...
  → per-step evidence capture  (screenshots, DOM, console, network)
  → bug candidates
  → verifier  (fresh context, up to 3 repros, confidence scored)
  → server-side gates  (noise filter, duplicate merge, contradiction detect)
  → tracker delivery  (Jira / Linear / GitHub via MCP)
  → HTML report artefact  (preview / audit)

Specialist agents, today

The current roster. More roles land as the agent network grows · each specialist is composable with the 60+ QA tools in the shared toolbox.

orchestrator · owns the run graph, schedules steps, handles retries.
ui-explorer · walks routes and tests interactive controls.
component-auditor · checks specific UI components (modals, tabs, forms) against their expected behaviour contracts.
responsive-tester · replays coverage across all 6 viewports and flags layout divergence.
performance-profiler · measures navigation timing, long tasks, and Core Web Vitals where measurable.
bug-verifier · reproduces bug candidates independently, up to three times, with a manual-grade confidence scorer.
test-case-designer · generates structured test cases from discovery for the next run.
accessibility-reviewer · optional (and off by default), runs axe-core plus manual checks when you explicitly opt in.

The six viewports

desktop_1440x900
laptop_1280x800
tablet_portrait_768x1024
tablet_landscape_1024x768
mobile_portrait_390x844
mobile_landscape_844x390

You can narrow this set per run if you only care about one form factor. The default is all six.

Verifier, in detail

Every bug candidate is handed to a bug-verifier subagent in a fresh browser context with no memory of the original run. The verifier:

Re-executes the repro steps up to three times.
Records "reproduced", "reproduced intermittently", or "could not reproduce".
Scores manual-grade confidence: high, medium, low.
Rejects anything below medium.

Only verified bugs make it into the server-side gates. Everything else is dropped and logged for learning.

Server-side quality gates

Before a bug becomes a ticket, it passes through these filters:

Empty-bug filter. Bugs without a repro, without a selector, or with a vague title are rejected.
Third-party noise filter. Console errors from GA, Sentry, fonts, and ad-tech domains are not yours to fix and are dropped.
Duplicate merge. If multiple agents report the same surface, they merge into one bug with N sources.
Contradiction detector. If two agents make incompatible claims about the same DOM, the bug is sent back for another verifier pass.
Project memory. Known flaky selectors and known false positives from prior runs are suppressed.

The ticket you receive

Every bug that survives is filed as a real ticket in your tracker, over a native MCP connector. The ticket contains:

A crisp title (benefit: owner / user).
Repro steps, numbered.
Acceptance criteria, testable.
Severity (P1 / P2 / P3) with a short rationale.
Suggested owner (based on file ownership, if opted in to git-diff prioritization).
An evidence bundle: highlighted screenshot, full-page capture, DOM snapshot, console log, network trace, verifier confidence.
A back-link to the full HTML report artefact for audit.

Run modes

Quick tier. Smoke-test-class coverage. ~2 minutes.
Standard tier. Default. Comprehensive coverage. ~4 minutes on typical apps.
Exhaustive tier. Long-running, deeper walk including rarely-visited surfaces. ~15 minutes.

Related: Product overview · Features · Pricing

How a run actually works.