Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • Communities
  • News
  • Podcasts
  • Blogs
  • Builds
  • Contests
  • Compare
  • Arena
Create
    EveryDev.ai
    Sign inSubscribe
    Home
    Tools

    2,376+ AI tools

    • New
    • Trending
    • Featured
    • Compare
    • Arena
    Categories
    • Agents1543
    • Coding1167
    • Infrastructure522
    • Marketing438
    • Design413
    • Projects377
    • Research348
    • Analytics325
    • Testing213
    • MCP206
    • Data200
    • Security186
    • Integration167
    • Learning154
    • Communication144
    • Prompts139
    • Extensions133
    • Voice122
    • Commerce121
    • DevOps97
    • Web75
    • Finance21
    1. Home
    2. Tools
    3. forge
    forge icon

    forge

    Agent Frameworks

    A reliability layer for self-hosted LLM tool-calling that lifts small local models to top-tier performance on multi-step agentic workflows via guardrails and context management.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Fully free and open-source under the MIT License. Install via pip or clone from GitHub.

    Engagement

    Available On

    Web
    API
    SDK
    CLI

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    Agent FrameworksLocal InferenceLLM Orchestration

    Alternatives

    OrKaMem0Inngest
    Developer
    Antoine ZambelliAntoine Zambelli builds open-source Python tooling for self-…

    Listed May 2026

    About forge

    Forge is an open-source Python framework by Antoine Zambelli that adds a reliability layer on top of self-hosted LLM backends for tool-calling and multi-step agentic workflows. It is published under the MIT license and available on PyPI as forge-guardrails. The framework is backed by a peer-reviewed paper published at ACM (DOI: 10.1145/3786335.3813193).

    What It Is

    Forge is a middleware and orchestration library designed to make small, locally-run language models (around 8B parameters) reliably execute structured tool-calling workflows. It addresses a core weakness of small models — their tendency to produce malformed tool calls, skip required steps, or lose context over long conversations — through composable guardrails, context compaction strategies, and a proxy server that makes any OpenAI-compatible client benefit from these improvements transparently.

    Three Usage Modes

    Forge offers three distinct integration patterns:

    • WorkflowRunner — A full agentic loop manager. Developers define tools, select a backend, and let Forge handle system prompts, tool execution, context compaction, and guardrails. SlotWorker extends this with priority-queued access to a shared GPU inference slot, enabling multi-agent architectures where specialist workflows share hardware.
    • Guardrails middleware — Composable middleware that plugs into an existing orchestration loop. The developer controls the loop; Forge validates responses, rescues malformed tool calls, and enforces required workflow steps.
    • Proxy server — A drop-in OpenAI-compatible proxy (python -m forge.proxy) that sits between any client (opencode, Continue, aider, etc.) and a local model server, applying guardrails transparently without client-side changes.

    Guardrails and Context Management

    The guardrail stack includes rescue parsing for malformed tool calls, retry nudges that guide the model back on track, and step enforcement that ensures required workflow steps are completed. Context management is VRAM-aware, with tiered compaction strategies (NoCompact, TieredCompact, SlidingWindowCompact) that keep token budgets within hardware limits. A synthetic respond tool is injected by the proxy to keep small models in tool-calling mode rather than switching to bare text output — the client never sees this internal mechanism.

    Backend Support and Eval Results

    Forge supports four backends:

    • llama-server (llama.cpp) — Recommended; the top 10 eval configurations all run on llama-server.
    • Ollama — Easier setup with built-in model management; slightly weaker on harder workloads.
    • Llamafile — Single binary, zero dependencies; uses prompt-injected function calling.
    • Anthropic — Frontier API baseline for hybrid workflows; no local GPU required.

    The project ships a 26-scenario eval harness split into an OG-18 baseline tier and an 8-scenario advanced reasoning tier. According to the repository, the current top self-hosted configuration (Ministral-3 8B Instruct Q8 on llama-server) scores 86.5% across all 26 scenarios and 76% on the hardest tier.

    Architecture and Project Structure

    The codebase is organized into clearly separated modules: core/ (workflow definition, inference loop, runner, slot worker), guardrails/ (nudge templates, response validator, step enforcer, error tracker), clients/ (Ollama, Llamafile, Anthropic), context/ (manager, compaction strategies, hardware detection), prompts/, tools/, and proxy/. The test suite includes 865 deterministic unit tests that require no LLM backend, plus the eval harness for live model qualification.

    Update: Active Development as of May 2026

    The repository was created in February 2026 and last pushed in May 2026, indicating active early development. It has accumulated over 1,100 stars and 56 forks according to the GitHub repository metadata. The published ACM paper provides a formal ablation study of the guardrail framework, and the preprint is preserved in the repository as a historical artifact.

    forge - 1

    Community Discussions

    Be the first to start a conversation about forge

    Share your experience with forge, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source

    Fully free and open-source under the MIT License. Install via pip or clone from GitHub.

    • Full WorkflowRunner and SlotWorker
    • Guardrails middleware
    • OpenAI-compatible proxy server
    • All backend integrations (Ollama, llama-server, Llamafile, Anthropic)
    • 26-scenario eval harness

    Capabilities

    Key Features

    • WorkflowRunner for full agentic loop management
    • SlotWorker for priority-queued multi-agent GPU slot sharing
    • Composable guardrails middleware for existing orchestration loops
    • OpenAI-compatible proxy server with transparent guardrail injection
    • Rescue parsing for malformed tool calls
    • Retry nudges for model correction
    • Required step enforcement
    • VRAM-aware context budget management
    • Tiered context compaction strategies (NoCompact, TieredCompact, SlidingWindowCompact)
    • Synthetic respond tool injection for small model reliability
    • 26-scenario eval harness with OG-18 and advanced reasoning tiers
    • Batch eval with JSONL output and automatic resume
    • 865 deterministic unit tests requiring no LLM backend
    • Support for Ollama, llama-server, Llamafile, and Anthropic backends
    • Hardware detection for VRAM-aware budgeting
    • SSE streaming support in proxy server

    Integrations

    Ollama
    llama-server (llama.cpp)
    Llamafile
    Anthropic Claude
    opencode
    Continue
    aider
    PyPI (forge-guardrails)
    Pydantic
    API Available
    View Docs

    Reviews & Ratings

    No ratings yet

    Be the first to rate forge and help others make informed decisions.

    Developer

    Antoine Zambelli

    Antoine Zambelli builds open-source Python tooling for self-hosted LLM reliability and agentic workflows. He created Forge, a guardrail framework for local model tool-calling, backed by a peer-reviewed ACM publication. The project focuses on making small 8B-class models reliably execute structured multi-step workflows without requiring frontier API access.

    Read more about Antoine Zambelli
    WebsiteGitHub
    1 tool in directory

    Similar Tools

    OrKa icon

    OrKa

    Open-source tool for building AI workflows using YAML configuration instead of Python code, with built-in memory and local LLM support.

    Mem0 icon

    Mem0

    Mem0 is a universal, self-improving AI memory layer for LLM applications that enables personalized AI experiences while reducing token costs by up to 80%.

    Inngest icon

    Inngest

    Durable execution and workflow orchestration platform for building scalable, fault-tolerant serverless and AI agent workflows.

    Browse all tools

    Related Topics

    Agent Frameworks

    Tools and platforms for building and deploying custom AI agents.

    304 tools

    Local Inference

    Tools and platforms for running AI inference locally without cloud dependence.

    104 tools

    LLM Orchestration

    Platforms and frameworks for designing, managing, and deploying complex LLM workflows with visual interfaces, allowing for the coordination of multiple AI models and services.

    123 tools
    Browse all topics
    Back to all tools
    Explore AI Tools
    • AI Coding Assistants
    • Agent Frameworks
    • MCP Servers
    • AI Prompt Tools
    • Vibe Coding Tools
    • AI Design Tools
    • AI Database Tools
    • AI Website Builders
    • AI Testing Tools
    • LLM Evaluations
    Follow Us
    • X / Twitter
    • LinkedIn
    • Reddit
    • Discord
    • Threads
    • Bluesky
    • Mastodon
    • YouTube
    • GitHub
    • Instagram
    Get Started
    • About
    • Editorial Standards
    • Corrections & Disclosures
    • Community Guidelines
    • Advertise
    • Contact Us
    • Newsletter
    • Submit a Tool
    • Start a Discussion
    • Write A Blog
    • Share A Build
    • Terms of Service
    • Privacy Policy
    Explore with AI
    • ChatGPT
    • Gemini
    • Claude
    • Grok
    • Perplexity
    Agent Experience
    • llms.txt
    Theme
    With AI, Everyone is a Dev. EveryDev.ai © 2026
    Discussions