Self-Verified Coding Agent

Where Code Gets Forged, Hammered, and Tested Until It Holds

Anvil doesn't just write code — it plans, executes, verifies, and recovers. Built on 210K+ agent traces across 21 projects. Stops only when tests pass.

Get Started View on GitHub

278+ Tests

21 Projects

6 Model Backends

210K+ Training Examples

anvil — verify-loop

1def resolve_imports(tree):

2 # Plan: find all undefined names

3 unresolved = collect_undefined(tree)

4 for name in unresolved:

5 tree = infer_and_inject(tree, name)

6 ✗ 3 tests failed — AssertionError

7⟳ Recovering: fixing import order...

8 tree = reorder_imports(tree)

9 ✓ 5/5 tests passed

10 ✓ Verification complete

Verify loop complete — all tests pass

VERIFY → RECOVER → VERIFY

How It Works

The Forge Doesn't Stop Until It Holds

Most coding agents write code and hand it to you. Anvil runs a continuous Plan → Execute → Verify → Recover loop — iterating until every test passes.

Plan

Analyze the task, decompose into steps, identify dependencies and constraints before writing a single line.

Execute

Generate code with full context awareness — understanding project structure, conventions, and test expectations.

Verify

Run the test suite. If tests fail, the loop doesn't stop — every failure is a signal, not a dead end.

Recover

Parse errors, diagnose root causes, and apply targeted fixes. Then re-verify. Repeat until green.

Unlike other agents, Anvil doesn't just write code — it verifies it, recovers from errors, and only stops when tests pass. This is the forge: heat, hammer, quench, test. Repeat.

Live Demo

See Anvil in Action

Watch the verification loop run end-to-end. Anvil writes code, runs tests, catches failures, and fixes them — all without human intervention.

anvil-demo — bash

$ anvil run "Fix the off-by-one error in users.py"

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⊕ Planning Analyzing users.py for off-by-one...

⊕ Execute Fixing range(1, n+1) → range(0, n) at line 47

⊕ Verify Running pytest...

FAIL test_users.py::test_pagination — AssertionError: expected 10, got 9

✗ Recover Root cause: boundary condition in page slicing

⊕ Execute Adjusting slice logic: items[page*size:(page+1)*size]

⊕ Verify Running pytest...

PASS test_users.py::test_pagination

PASS test_users.py::test_user_creation

PASS test_users.py::test_delete_user

✓ All 3 tests pass — verification complete

$ anvil run "Add rate-limiting to the API endpoints"

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⊕ Planning Analyzing API structure and middleware...

⊕ Execute Creating rate_limiter.py with sliding window algorithm

⊕ Execute Adding decorator to 4 endpoints in api.py

⊕ Verify Running pytest...

FAIL test_api.py::test_rate_limit — TimeoutError: request was not blocked

✗ Recover Fix: window size was 60000ms, needed 1000ms for test env

⊕ Execute Injecting TEST_RATE_LIMIT=100 env flag

⊕ Verify Running pytest...

PASS test_api.py::test_rate_limit

PASS test_api.py::test_bypass_limit

PASS test_api.py::test_unauthorized

✓ All 3 tests pass — feature added

$ anvil run "Refactor database layer to use async/await"

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

⊕ Planning Mapping sync DB calls across 12 files...

⊕ Execute Converting db.py: fetch_one → afetch, execute → aexecute

⊕ Execute Updating 8 query functions to async

⊕ Verify Running pytest...

FAIL test_db.py::test_connection — RuntimeError: no current event loop

✗ Recover Adding asyncio.run() wrapper in test setup fixture

⊕ Execute Creating event_loop fixture in conftest.py

⊕ Verify Running pytest...

PASS test_db.py::test_connection

PASS test_db.py::test_query_async

PASS test_db.py::test_transaction_rollback

✓ All 3 tests pass — refactor complete

Ecosystem

21 Projects, One Forge

A complete ecosystem for building, training, and deploying self-verified coding agents — from the core loop to specialized models to production infrastructure.

Anvil

Core

The self-verified coding agent. Plans, executes, verifies, recovers — until tests pass.

pip install anvil-agent

VerifyLoop

Core

The verification loop engine. Runs tests, parses failures, feeds signals back to recovery.

pip install verifyloop

ErrorRecovery

Core

Structured error diagnosis and targeted fix generation from test failure signals.

pip install errorrecovery

AgentSwarm

Core

Multi-agent orchestration for parallel task execution with shared context.

pip install agentswarm

FableForge-14B

Model

14B parameter coding model fine-tuned on verified agent traces. Full code understanding.

pip install fableforge-14b

ShellWhisperer-1.5B

Model

Lightweight shell command model for terminal interaction and environment execution.

pip install shellwhisperer

ReasonCritic-7B

Model

Reasoning critic model for verifying logic, catching edge cases, and detecting hallucinations.

pip install reasoncritic

FableForge Dataset

Data

210K+ verified agent traces across Python, JS, Rust, Go — the training fuel for Anvil models.

pip install fableforge-dataset

TrajectoryDistiller

Data

Filters, cleans, and distills raw agent traces into high-quality training examples.

pip install trajectory-distiller

TraceCompiler

Data

Compiles multi-step agent traces into structured, tokenized training data for fine-tuning.

pip install trace-compiler

Anvil Runtime

Infra

Production runtime for deploying Anvil agents with sandboxed execution and monitoring.

pip install anvil-runtime

Telemetry

Infra

Observability for agent loops — trace every verify-recover cycle, token usage, and latency.

pip install anvil-telemetry

BenchAgent

Infra

Automated benchmarking framework for evaluating agent performance across coding tasks.

pip install benchagent

Models

Forged for Verification

Three purpose-built models — each trained on verified agent traces, each designed to make the verification loop stronger.

Train Free on Colab

FableForge-14B

14B Parameters

Context Window 32K tokens

Training Data 210K+ traces

Architecture Transformer

Code + Verify Specialization

Get FableForge-14B

Train Free on Colab

ShellWhisperer-1.5B

1.5B Parameters

Context Window 8K tokens

Training Data 50K shell traces

Architecture Transformer

Shell Commands Specialization

Get ShellWhisperer

Train Free on Colab

ReasonCritic-7B

7B Parameters

Context Window 16K tokens

Training Data 80K critic traces

Architecture Transformer

Reasoning Specialization

Get ReasonCritic

Quick Start

Start Forging in 30 Seconds

Install Anvil, run your first task, and watch the verification loop in action.

# Install the Anvil agent

$ pip install anvil-agent

# Or install the full ecosystem

$ pip install anvil-agent[all]

# Run Anvil on a task

$ anvil run "Fix the failing tests in src/"

# Or run as a daemon watching for changes

$ anvil daemon --watch src/

# Verify results

$ anvil verify --suite all

✓ 12/12 tests passed

✓ Verification complete — 0 errors remaining

Benchmarks

Tested Worse, Deployed Better

Anvil with verification loops consistently outperforms agents without verification across every benchmark.

Benchmark	Anvil (w/ Verify)	Without Verify	Δ
HumanEval	89.2%	72.1%	+17.1%
MBPP	84.6%	68.3%	+16.3%
SWE-Bench Lite	31.4%	18.7%	+12.7%
LiveCodeBench	71.8%	55.2%	+16.6%
MultiPL-E (avg)	76.3%	61.9%	+14.4%

Verification Loop Impact

First-pass success rate 42%

Success after verify-recover 89%

Avg recovery iterations 2.3

Time overhead per fix 8s avg

Community

Built in the Open, Forged by All

Join the community shaping the future of self-verified coding agents. Every issue, PR, and trace makes Anvil stronger.

GitHub

Star, fork, and contribute to all 21 projects in the FableForge ecosystem.

View on GitHub

Discord

Get help, share traces, and discuss agent design with the community.

Join Discord

Docs

Full API reference, tutorials, and guides for every project in the ecosystem.

Read the Docs

Built from 210K+ agent traces — and counting