Self-Verified Coding Agent

Where Code Gets Forged, Hammered, and Tested Until It Holds

Anvil doesn't just write code — it plans, executes, verifies, and recovers. Built on 210K+ agent traces across 21 projects. Stops only when tests pass.

278+ Tests
21 Projects
6 Model Backends
210K+ Training Examples
anvil — verify-loop
1def resolve_imports(tree):
2 # Plan: find all undefined names
3 unresolved = collect_undefined(tree)
4 for name in unresolved:
5 tree = infer_and_inject(tree, name)
6 ✗ 3 tests failed — AssertionError
7⟳ Recovering: fixing import order...
8 tree = reorder_imports(tree)
9 ✓ 5/5 tests passed
10 ✓ Verification complete
Verify loop complete — all tests pass
VERIFY → RECOVER → VERIFY

The Forge Doesn't Stop Until It Holds

Most coding agents write code and hand it to you. Anvil runs a continuous Plan → Execute → Verify → Recover loop — iterating until every test passes.

1

Plan

Analyze the task, decompose into steps, identify dependencies and constraints before writing a single line.

2

Execute

Generate code with full context awareness — understanding project structure, conventions, and test expectations.

3

Verify

Run the test suite. If tests fail, the loop doesn't stop — every failure is a signal, not a dead end.

4

Recover

Parse errors, diagnose root causes, and apply targeted fixes. Then re-verify. Repeat until green.

Unlike other agents, Anvil doesn't just write code — it verifies it, recovers from errors, and only stops when tests pass. This is the forge: heat, hammer, quench, test. Repeat.

See Anvil in Action

Watch the verification loop run end-to-end. Anvil writes code, runs tests, catches failures, and fixes them — all without human intervention.

anvil-demo — bash
$ anvil run "Fix the off-by-one error in users.py"
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
⊕ Planning Analyzing users.py for off-by-one...
⊕ Execute Fixing range(1, n+1) → range(0, n) at line 47
⊕ Verify Running pytest...
FAIL test_users.py::test_pagination — AssertionError: expected 10, got 9
✗ Recover Root cause: boundary condition in page slicing
⊕ Execute Adjusting slice logic: items[page*size:(page+1)*size]
⊕ Verify Running pytest...
PASS test_users.py::test_pagination
PASS test_users.py::test_user_creation
PASS test_users.py::test_delete_user
✓ All 3 tests pass — verification complete

21 Projects, One Forge

A complete ecosystem for building, training, and deploying self-verified coding agents — from the core loop to specialized models to production infrastructure.

Anvil

Core

The self-verified coding agent. Plans, executes, verifies, recovers — until tests pass.

pip install anvil-agent

VerifyLoop

Core

The verification loop engine. Runs tests, parses failures, feeds signals back to recovery.

pip install verifyloop

ErrorRecovery

Core

Structured error diagnosis and targeted fix generation from test failure signals.

pip install errorrecovery

AgentSwarm

Core

Multi-agent orchestration for parallel task execution with shared context.

pip install agentswarm

FableForge-14B

Model

14B parameter coding model fine-tuned on verified agent traces. Full code understanding.

pip install fableforge-14b

ShellWhisperer-1.5B

Model

Lightweight shell command model for terminal interaction and environment execution.

pip install shellwhisperer

ReasonCritic-7B

Model

Reasoning critic model for verifying logic, catching edge cases, and detecting hallucinations.

pip install reasoncritic

FableForge Dataset

Data

210K+ verified agent traces across Python, JS, Rust, Go — the training fuel for Anvil models.

pip install fableforge-dataset

TrajectoryDistiller

Data

Filters, cleans, and distills raw agent traces into high-quality training examples.

pip install trajectory-distiller

TraceCompiler

Data

Compiles multi-step agent traces into structured, tokenized training data for fine-tuning.

pip install trace-compiler

Anvil Runtime

Infra

Production runtime for deploying Anvil agents with sandboxed execution and monitoring.

pip install anvil-runtime

Telemetry

Infra

Observability for agent loops — trace every verify-recover cycle, token usage, and latency.

pip install anvil-telemetry

BenchAgent

Infra

Automated benchmarking framework for evaluating agent performance across coding tasks.

pip install benchagent

Forged for Verification

Three purpose-built models — each trained on verified agent traces, each designed to make the verification loop stronger.

Train Free on Colab

FableForge-14B

14B Parameters
Context Window 32K tokens
Training Data 210K+ traces
Architecture Transformer
Code + Verify Specialization
Input 24 Layers + Verify Head Output
Get FableForge-14B
Train Free on Colab

ShellWhisperer-1.5B

1.5B Parameters
Context Window 8K tokens
Training Data 50K shell traces
Architecture Transformer
Shell Commands Specialization
Cmd 12 Layers + Shell Head Exec
Get ShellWhisperer
Train Free on Colab

ReasonCritic-7B

7B Parameters
Context Window 16K tokens
Training Data 80K critic traces
Architecture Transformer
Reasoning Specialization
Code Output 16 Layers + Critic Head Pass/Fail
Get ReasonCritic

Start Forging in 30 Seconds

Install Anvil, run your first task, and watch the verification loop in action.

# Install the Anvil agent
$ pip install anvil-agent
# Or install the full ecosystem
$ pip install anvil-agent[all]

Tested Worse, Deployed Better

Anvil with verification loops consistently outperforms agents without verification across every benchmark.

Benchmark Anvil (w/ Verify) Without Verify Δ
HumanEval 89.2% 72.1% +17.1%
MBPP 84.6% 68.3% +16.3%
SWE-Bench Lite 31.4% 18.7% +12.7%
LiveCodeBench 71.8% 55.2% +16.6%
MultiPL-E (avg) 76.3% 61.9% +14.4%

Verification Loop Impact

First-pass success rate 42%
Success after verify-recover 89%
Avg recovery iterations 2.3
Time overhead per fix 8s avg

Built in the Open, Forged by All

Join the community shaping the future of self-verified coding agents. Every issue, PR, and trace makes Anvil stronger.

GitHub

Star, fork, and contribute to all 21 projects in the FableForge ecosystem.

View on GitHub

Discord

Get help, share traces, and discuss agent design with the community.

Join Discord

Docs

Full API reference, tutorials, and guides for every project in the ecosystem.

Read the Docs
Built from 210K+ agent traces — and counting