The Problem

Different models have different strengths and different blind spots. Claude is strong on architecture and reasoning but would sometimes roll over function-level bugs and missed optimisations, particularly in file-by-file review. I built Spotter to bring open code models into the loop as a dedicated audit layer, reviewing file by file and catching what Claude missed.

Spotter watches file changes and routes diffs to multiple LLMs in parallel — Ollama, OpenWebUI, Claude Code, or any OpenAI-compatible endpoint. It cross-validates findings before surfacing them, running as a zero-interruption loop while you work, not after you push. The result is that human review shifts to a true final pass: instead of starting from scratch, you're resolving flagged issues that have already been surfaced and cross-validated. The 'you should have caught this' category of bugs mostly disappears before review even starts.

What's interesting

Multi-model consensus engine

Findings from each model are clustered using Jaccard token similarity and union-find grouping. Only findings that survive a configurable quorum make it through, reducing noise without losing signal. A judge strategy lets one model act as a second pass over the surviving candidates.

Four review modes

Watch mode debounces file changes and reviews diffs incrementally. One-shot works as a pre-commit hook. Deep audit batches the full codebase with resumable checkpoints so long runs can be interrupted and continued. Git range mode reviews a commit range retroactively.

Layered config with per-project overrides

Config loads from hardcoded defaults → global XDG config → project .spotter.yml → CLI flags. Environment variables interpolate inline. Per-model system prompts let you give different instructions to each backend. One watches for runtime bugs, another for structural patterns.

Context isolation for Claude Code

The Claude Code backend runs from a temp directory to prevent the project's CLAUDE.md from bleeding into the review prompt. Without this, the reviewer inherits project-specific instructions that bias findings.

Why this matters

AI-assisted development is already standard; the tooling around it isn't. Most teams are still treating LLM code generation as a black box and discovering its failure modes in production. Spotter is one piece of what that answer looks like: layered, multi-model, automated, and tuned to the specific gaps the high-level reasoning passes leave behind.