EveryDev.ai
Subscribe
Home
Tools

3,020+ AI tools

  • New
  • Trending
  • Featured
  • Compare
  • Arena
Categories
  • Agents2145
  • Coding1511
  • Infrastructure681
  • Marketing532
  • Projects485
  • Research447
  • Design413
  • Analytics378
  • MCP278
  • Security271
  • Testing264
  • Data256
  • Integration188
  • Prompts185
  • Communication176
  • Learning170
  • Extensions169
  • Voice150
  • Commerce134
  • DevOps115
  • Web86
  • Finance26
AI Tools by Topic
  • AI Coding Assistants
  • Agent Frameworks
  • MCP Servers
  • AI Prompt Tools
  • Vibe Coding Tools
  • AI Design Tools
  • AI Database Tools
  • AI Website Builders
  • AI Testing Tools
  • LLM Evaluations
Follow Us
  • X / Twitter
  • LinkedIn
  • Reddit
  • Discord
  • Threads
  • Bluesky
  • Mastodon
  • YouTube
  • GitHub
  • Instagram
Get Started
  • About
  • Editorial Standards
  • Corrections & Disclosures
  • Community Guidelines
  • Advertise
  • Contact Us
  • Newsletter
  • Submit a Tool
  • Start a Discussion
  • Write A Blog
  • Share A Build
  • Terms of Service
  • Privacy Policy
Explore with AI
  • ChatGPT
  • Gemini
  • Claude
  • Grok
  • Perplexity
Agent Experience
  • llms.txt
Theme
With AI, Everyone is a Dev. EveryDev.ai © 2026
    1. Home
    2. Tools
    3. forge
    forge icon

    forge

    Agent Frameworks

    A reliability layer for self-hosted LLM tool-calling that lifts small local models to top-tier performance on multi-step agentic workflows via guardrails and context management.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Fully free and open-source under the MIT License. Install via pip or clone from GitHub.

    Engagement

    Available On

    Web
    API
    SDK
    CLI

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    Agent FrameworksLocal InferenceLLM Orchestration

    Alternatives

    FastAgencyQwen-AgentParlant
    Developer
    Antoine ZambelliAntoine Zambelli builds open-source Python tooling for self-…

    Listed May 2026

    About forge

    Forge is an open-source Python framework by Antoine Zambelli that adds a reliability layer on top of self-hosted LLM backends for tool-calling and multi-step agentic workflows. It is published under the MIT license and available on PyPI as forge-guardrails. The framework is backed by a peer-reviewed paper published at ACM (DOI: 10.1145/3786335.3813193).

    What It Is

    Forge is a middleware and orchestration library designed to make small, locally-run language models (around 8B parameters) reliably execute structured tool-calling workflows. It addresses a core weakness of small models — their tendency to produce malformed tool calls, skip required steps, or lose context over long conversations — through composable guardrails, context compaction strategies, and a proxy server that makes any OpenAI-compatible client benefit from these improvements transparently.

    Three Usage Modes

    Forge offers three distinct integration patterns:

    • WorkflowRunner — A full agentic loop manager. Developers define tools, select a backend, and let Forge handle system prompts, tool execution, context compaction, and guardrails. SlotWorker extends this with priority-queued access to a shared GPU inference slot, enabling multi-agent architectures where specialist workflows share hardware.
    • Guardrails middleware — Composable middleware that plugs into an existing orchestration loop. The developer controls the loop; Forge validates responses, rescues malformed tool calls, and enforces required workflow steps.
    • Proxy server — A drop-in OpenAI-compatible proxy (python -m forge.proxy) that sits between any client (opencode, Continue, aider, etc.) and a local model server, applying guardrails transparently without client-side changes.

    Guardrails and Context Management

    The guardrail stack includes rescue parsing for malformed tool calls, retry nudges that guide the model back on track, and step enforcement that ensures required workflow steps are completed. Context management is VRAM-aware, with tiered compaction strategies (NoCompact, TieredCompact, SlidingWindowCompact) that keep token budgets within hardware limits. A synthetic respond tool is injected by the proxy to keep small models in tool-calling mode rather than switching to bare text output — the client never sees this internal mechanism.

    Backend Support and Eval Results

    Forge supports four backends:

    • llama-server (llama.cpp) — Recommended; the top 10 eval configurations all run on llama-server.
    • Ollama — Easier setup with built-in model management; slightly weaker on harder workloads.
    • Llamafile — Single binary, zero dependencies; uses prompt-injected function calling.
    • Anthropic — Frontier API baseline for hybrid workflows; no local GPU required.

    The project ships a 26-scenario eval harness split into an OG-18 baseline tier and an 8-scenario advanced reasoning tier. According to the repository, the current top self-hosted configuration (Ministral-3 8B Instruct Q8 on llama-server) scores 86.5% across all 26 scenarios and 76% on the hardest tier.

    Architecture and Project Structure

    The codebase is organized into clearly separated modules: core/ (workflow definition, inference loop, runner, slot worker), guardrails/ (nudge templates, response validator, step enforcer, error tracker), clients/ (Ollama, Llamafile, Anthropic), context/ (manager, compaction strategies, hardware detection), prompts/, tools/, and proxy/. The test suite includes 865 deterministic unit tests that require no LLM backend, plus the eval harness for live model qualification.

    Update: Active Development as of May 2026

    The repository was created in February 2026 and last pushed in May 2026, indicating active early development. It has accumulated over 1,100 stars and 56 forks according to the GitHub repository metadata. The published ACM paper provides a formal ablation study of the guardrail framework, and the preprint is preserved in the repository as a historical artifact.

    forge - 1

    Community Discussions

    Be the first to start a conversation about forge

    Share your experience with forge, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source

    Fully free and open-source under the MIT License. Install via pip or clone from GitHub.

    • Full WorkflowRunner and SlotWorker
    • Guardrails middleware
    • OpenAI-compatible proxy server
    • All backend integrations (Ollama, llama-server, Llamafile, Anthropic)
    • 26-scenario eval harness

    Capabilities

    Key Features

    • WorkflowRunner for full agentic loop management
    • SlotWorker for priority-queued multi-agent GPU slot sharing
    • Composable guardrails middleware for existing orchestration loops
    • OpenAI-compatible proxy server with transparent guardrail injection
    • Rescue parsing for malformed tool calls
    • Retry nudges for model correction
    • Required step enforcement
    • VRAM-aware context budget management
    • Tiered context compaction strategies (NoCompact, TieredCompact, SlidingWindowCompact)
    • Synthetic respond tool injection for small model reliability
    • 26-scenario eval harness with OG-18 and advanced reasoning tiers
    • Batch eval with JSONL output and automatic resume
    • 865 deterministic unit tests requiring no LLM backend
    • Support for Ollama, llama-server, Llamafile, and Anthropic backends
    • Hardware detection for VRAM-aware budgeting
    • SSE streaming support in proxy server

    Integrations

    Ollama
    llama-server (llama.cpp)
    Llamafile
    Anthropic Claude
    opencode
    Continue
    aider
    PyPI (forge-guardrails)
    Pydantic
    API Available
    View Docs

    Ratings & Reviews

    No ratings yet

    Be the first to rate forge and help others make informed decisions.

    Developer

    Antoine Zambelli

    Antoine Zambelli builds open-source Python tooling for self-hosted LLM reliability and agentic workflows. He created Forge, a guardrail framework for local model tool-calling, backed by a peer-reviewed ACM publication. The project focuses on making small 8B-class models reliably execute structured multi-step workflows without requiring frontier API access.

    Read more about Antoine Zambelli
    WebsiteGitHub
    1 tool in directory

    Similar Tools

    FastAgency icon

    FastAgency

    An open-source Python framework for deploying AG2 multi-agent workflows to production with unified UI, REST API, and distributed messaging support.

    Qwen-Agent icon

    Qwen-Agent

    An open-source framework by Alibaba's Qwen team for building LLM applications with function calling, tool usage, planning, MCP, RAG, and memory capabilities.

    Parlant icon

    Parlant

    Open-source conversational AI engine that keeps LLM agents business-aligned and compliant through alignment modeling and granular guidelines.

    Browse all tools

    Related Topics

    Agent Frameworks

    Tools and platforms for building and deploying custom AI agents.

    461 tools

    Local Inference

    Tools and platforms for running AI inference locally without cloud dependence.

    132 tools

    LLM Orchestration

    Platforms and frameworks for designing, managing, and deploying complex LLM workflows with visual interfaces, allowing for the coordination of multiple AI models and services.

    168 tools
    Browse all topics
    Back to all toolsSuggest an edit
    ratings
    discussions
    61views