audit

Why it's different

Single-pass scanners ship noise.
audit ships proof.

Five design decisions that separate real, reachable findings from the flood of plausible-but-wrong guesses.

🎯

Narrow agents, not one mega-prompt

One attack class per task, with the trust boundary spelled out. Focused hunters surface bugs a "find bugs here" prompt never will.

🥊

Deliberate disagreement

Validate runs on a different model than Hunt and is paid in rejections — it filters the noise single-pass tools ship.

🔓

Reachability is the gate

A "buggy" sink no attacker input can reach is dropped. Only confirmed and reachable findings make the report.

🔁

It learns as it runs

A proven-reachable bug seeds new hunts for the same pattern elsewhere in the repo — siblings get found too.

🧾

Schema-validated & resumable

Every agent output is shape-checked against JSON Schema, every run is checkpointed in SQLite, and a cost ceiling aborts cleanly.

🔒

Subscription billing by default

Runs on the official Claude Agent SDK — no API key needed. The metered API key is scrubbed so it can't route to billing.

The pipeline

8 stages, two loops,
one question that matters.

Recon → Hunt → Validate → Gapfill ↺ → Dedupe → Trace → Feedback ↺ → Report

Recon

Opus

Maps the repo + git history and emits narrowly-scoped Hunt tasks — one attack class, concrete files, explicit trust boundary.

Hunt

Sonnet

One attack class per agent, run in parallel. Compiles and runs real PoCs to prove the bug rather than guessing.

Validate

Opus

An adversarial re-read on a different model that tries to disprove each finding. The skeptic that kills false positives.

Gapfill

Sonnet↻ loop

Re-queues under-covered subsystem × attack class cells so coverage expands where hunters drifted away.

Dedupe

Sonnet

Clusters findings strictly by root cause — many call sites of one buggy helper collapse into a single fixable issue.

Trace

Opusthe gate

Proves attacker-controlled input actually reaches the sink from an external entry point. Unreachable = out of scope.

Feedback

Sonnet↻ loop

Turns each reachable trace into new hunts for structurally similar siblings elsewhere — the learning loop.

Report

Sonnet

A schema-validated, structured report — reachable-only, severity-consistent, with the entry-point→sink trace attached.

Built for real workflows

From a one-off scan
to every pull request.

✓

Diff / PR mode. --base/--since scope the scan to changed files + blast radius. A PR scan costs cents.

✓

Baseline & delta. Fingerprint findings, suppress known ones, surface only what's NEW or FIXED.

✓

SARIF + exit-code gating. --fail-on high for CI; trace ships as codeFlows to the GitHub Security tab.

✓

Auto-fix (opt-in). audit fix writes a minimal patch + regression test in an isolated worktree; --open-pr opens a draft.

✓

Code-grounded advice. audit advise reads the real sink and explains the fix for your code, inline in the report.

✓

Triage viewer. --serve a local web UI to confirm / dismiss findings and export suppressions.

✓

Bug-bounty / VDP triage. Reproduce an inbound submission, run it through the reviewer + gate, emit accept/reject/duplicate.

✓

Live-target mode. Reproduce findings against a running deployment with real HTTP.

✓

Cost observability. audit stats breaks spend down by stage/model and reports cost-per-finding.

✓

Background runs. audit run -d detaches the pipeline; audit sessions lists what's alive.

Quickstart

Auditing in under a minute.

terminal⧉ copy

# 1 · install globally — requires Bun ≥ 1.3
bun add -g @usex/audit

# 2 · already logged in via `claude login`? done.
audit auth-check

# 3 · cd into the repo you want to audit
cd /path/to/target
audit run --run-id my-run

# 4 · read the report
audit report --run-id my-run --format md > report.md

Install

One global binary on your PATH, running on the Bun runtime.

Authenticate

Uses your Claude Pro / Max subscription — no API key. audit auth-check confirms it.

Run

Point it at any repo. State and artifacts land in the working directory; runs are resumable and budgeted.

Report

Export Markdown, JSON, or SARIF — every finding carries its reachability proof.

Single-pass scanners ship noise.
audit ships proof.

Narrow agents, not one mega-prompt

Deliberate disagreement

Reachability is the gate

It learns as it runs

Schema-validated & resumable

Subscription billing by default

8 stages, two loops,
one question that matters.

Recon

Hunt

Validate

Gapfill

Dedupe

Trace

Feedback

Report

Pointed at a Flask app,
it found a chain nobody planted.

From a one-off scan
to every pull request.

Auditing in under a minute.

Install

Authenticate

Run

Report

Find the bugs
that can actually be reached.

audit

Single-pass scanners ship noise.audit ships proof.

Narrow agents, not one mega-prompt

Deliberate disagreement

Reachability is the gate

It learns as it runs

Schema-validated & resumable

Subscription billing by default

8 stages, two loops,one question that matters.

Recon

Hunt

Validate

Gapfill

Dedupe

Trace

Feedback

Report

Pointed at a Flask app,it found a chain nobody planted.

From a one-off scanto every pull request.

Auditing in under a minute.

Install

Authenticate

Run

Report

Find the bugsthat can actually be reached.

Single-pass scanners ship noise.
audit ships proof.

8 stages, two loops,
one question that matters.

Pointed at a Flask app,
it found a chain nobody planted.

From a one-off scan
to every pull request.

Find the bugs
that can actually be reached.