Product

How a run actually works.

Longer version of the 3-step story on the home page · with the agent roster, what each one does, and the exact shape of the ticket that ends up in your tracker.

The pipeline

One NSPEC run is a directed graph that looks like this:

URL in
  → discovery (routes, controls, forms)
  → orchestrator schedules N steps
  → agent network × 6 viewports  (parallel, isolated browser contexts)
  → 60+ tools: click, fill, scroll, assert, capture, probe, ...
  → per-step evidence capture  (screenshots, DOM, console, network)
  → bug candidates
  → verifier  (fresh context, up to 3 repros, confidence scored)
  → server-side gates  (noise filter, duplicate merge, contradiction detect)
  → tracker delivery  (Jira / Linear / GitHub via MCP)
  → HTML report artefact  (preview / audit)

Specialist agents, today

The current roster. More roles land as the agent network grows · each specialist is composable with the 60+ QA tools in the shared toolbox.

  • orchestrator · owns the run graph, schedules steps, handles retries.
  • ui-explorer · walks routes and tests interactive controls.
  • component-auditor · checks specific UI components (modals, tabs, forms) against their expected behaviour contracts.
  • responsive-tester · replays coverage across all 6 viewports and flags layout divergence.
  • performance-profiler · measures navigation timing, long tasks, and Core Web Vitals where measurable.
  • bug-verifier · reproduces bug candidates independently, up to three times, with a manual-grade confidence scorer.
  • test-case-designer · generates structured test cases from discovery for the next run.
  • accessibility-reviewer · optional (and off by default), runs axe-core plus manual checks when you explicitly opt in.

The six viewports

  • desktop_1440x900
  • laptop_1280x800
  • tablet_portrait_768x1024
  • tablet_landscape_1024x768
  • mobile_portrait_390x844
  • mobile_landscape_844x390

You can narrow this set per run if you only care about one form factor. The default is all six.

Verifier, in detail

Every bug candidate is handed to a bug-verifier subagent in a fresh browser context with no memory of the original run. The verifier:

  1. Re-executes the repro steps up to three times.
  2. Records "reproduced", "reproduced intermittently", or "could not reproduce".
  3. Scores manual-grade confidence: high, medium, low.
  4. Rejects anything below medium.

Only verified bugs make it into the server-side gates. Everything else is dropped and logged for learning.

Server-side quality gates

Before a bug becomes a ticket, it passes through these filters:

  • Empty-bug filter. Bugs without a repro, without a selector, or with a vague title are rejected.
  • Third-party noise filter. Console errors from GA, Sentry, fonts, and ad-tech domains are not yours to fix and are dropped.
  • Duplicate merge. If multiple agents report the same surface, they merge into one bug with N sources.
  • Contradiction detector. If two agents make incompatible claims about the same DOM, the bug is sent back for another verifier pass.
  • Project memory. Known flaky selectors and known false positives from prior runs are suppressed.

The ticket you receive

Every bug that survives is filed as a real ticket in your tracker, over a native MCP connector. The ticket contains:

  • A crisp title (benefit: owner / user).
  • Repro steps, numbered.
  • Acceptance criteria, testable.
  • Severity (P1 / P2 / P3) with a short rationale.
  • Suggested owner (based on file ownership, if opted in to git-diff prioritization).
  • An evidence bundle: highlighted screenshot, full-page capture, DOM snapshot, console log, network trace, verifier confidence.
  • A back-link to the full HTML report artefact for audit.

Run modes

  • Quick tier. Smoke-test-class coverage. ~2 minutes.
  • Standard tier. Default. Comprehensive coverage. ~4 minutes on typical apps.
  • Exhaustive tier. Long-running, deeper walk including rarely-visited surfaces. ~15 minutes.

Related: Product overview · Features · Pricing