The pipeline
One NSPEC run is a directed graph that looks like this:
URL in → discovery (routes, controls, forms) → orchestrator schedules N steps → agent network × 6 viewports (parallel, isolated browser contexts) → 60+ tools: click, fill, scroll, assert, capture, probe, ... → per-step evidence capture (screenshots, DOM, console, network) → bug candidates → verifier (fresh context, up to 3 repros, confidence scored) → server-side gates (noise filter, duplicate merge, contradiction detect) → tracker delivery (Jira / Linear / GitHub via MCP) → HTML report artefact (preview / audit)
Specialist agents, today
The current roster. More roles land as the agent network grows · each specialist is composable with the 60+ QA tools in the shared toolbox.
- orchestrator · owns the run graph, schedules steps, handles retries.
- ui-explorer · walks routes and tests interactive controls.
- component-auditor · checks specific UI components (modals, tabs, forms) against their expected behaviour contracts.
- responsive-tester · replays coverage across all 6 viewports and flags layout divergence.
- performance-profiler · measures navigation timing, long tasks, and Core Web Vitals where measurable.
- bug-verifier · reproduces bug candidates independently, up to three times, with a manual-grade confidence scorer.
- test-case-designer · generates structured test cases from discovery for the next run.
- accessibility-reviewer · optional (and off by default), runs axe-core plus manual checks when you explicitly opt in.
The six viewports
- desktop_1440x900
- laptop_1280x800
- tablet_portrait_768x1024
- tablet_landscape_1024x768
- mobile_portrait_390x844
- mobile_landscape_844x390
You can narrow this set per run if you only care about one form factor. The default is all six.
Verifier, in detail
Every bug candidate is handed to a bug-verifier subagent in a fresh browser context with no memory of the original run. The verifier:
- Re-executes the repro steps up to three times.
- Records "reproduced", "reproduced intermittently", or "could not reproduce".
- Scores manual-grade confidence:
high,medium,low. - Rejects anything below
medium.
Only verified bugs make it into the server-side gates. Everything else is dropped and logged for learning.
Server-side quality gates
Before a bug becomes a ticket, it passes through these filters:
- Empty-bug filter. Bugs without a repro, without a selector, or with a vague title are rejected.
- Third-party noise filter. Console errors from GA, Sentry, fonts, and ad-tech domains are not yours to fix and are dropped.
- Duplicate merge. If multiple agents report the same surface, they merge into one bug with N sources.
- Contradiction detector. If two agents make incompatible claims about the same DOM, the bug is sent back for another verifier pass.
- Project memory. Known flaky selectors and known false positives from prior runs are suppressed.
The ticket you receive
Every bug that survives is filed as a real ticket in your tracker, over a native MCP connector. The ticket contains:
- A crisp title (benefit: owner / user).
- Repro steps, numbered.
- Acceptance criteria, testable.
- Severity (P1 / P2 / P3) with a short rationale.
- Suggested owner (based on file ownership, if opted in to git-diff prioritization).
- An evidence bundle: highlighted screenshot, full-page capture, DOM snapshot, console log, network trace, verifier confidence.
- A back-link to the full HTML report artefact for audit.
Run modes
- Quick tier. Smoke-test-class coverage. ~2 minutes.
- Standard tier. Default. Comprehensive coverage. ~4 minutes on typical apps.
- Exhaustive tier. Long-running, deeper walk including rarely-visited surfaces. ~15 minutes.
Related: Product overview · Features · Pricing