KoboldCpp
KoboldCpp is a single-file, easy-to-use AI text generation tool for GGML and GGUF models, supporting CPU/GPU inference, image generation, speech, and more.
At a Glance
Fully free and open-source under AGPL-3.0. Download and use with no cost.
Engagement
Available On
Alternatives
Listed Apr 2026
About KoboldCpp
KoboldCpp is a self-contained, easy-to-use AI text-generation software built on top of llama.cpp and inspired by the original KoboldAI. It ships as a single executable with no installation required and no external dependencies, supporting a wide range of GGML and GGUF models. Beyond text generation, it integrates image generation, video generation, speech-to-text, text-to-speech, music generation, and multimodal vision into one package. It runs on Windows, macOS, Linux, Android (via Termux), and cloud environments like Google Colab and RunPod.
- Single-file executable — Download and run with no installation or external dependencies on Windows, macOS, or Linux.
- CPU and GPU support — Runs on CPU or GPU with full or partial layer offloading; supports CUDA (Nvidia), Vulkan (any GPU), and Metal (Apple Silicon).
- LLM text generation — Supports all GGML and GGUF models with full backwards compatibility, including Llama, Mistral, Qwen, Gemma, Falcon, and hundreds more.
- Image generation and editing — Built-in Stable Diffusion support (SD1.5, SDXL, SD3, Flux, and more) with an A1111-compatible API.
- Video generation — Supports WAN 2.2 for AI video generation.
- Speech-to-text — Voice recognition via Whisper integration.
- Text-to-speech — Voice generation via Qwen3TTS, Kokoro, OuteTTS, Parler, and Dia.
- Music generation — Supports Ace Step 1.5 for AI music creation.
- Multimodal vision — Image recognition and vision capabilities for supported models.
- MCP Server support — Includes MCP server support and tool calling for agentic workflows.
- Multiple compatible APIs — Provides KoboldCpp, OpenAI, Ollama, A1111/Forge, ComfyUI, Whisper, XTTS, and OpenAI Speech API endpoints.
- Bundled KoboldAI Lite UI — Includes a full web UI with chat, adventure, instruct, and storywriter modes, character cards, world info, memory, and more.
- RAG and web search — Supports retrieval-augmented generation via TextDB and integrated web search.
- Cross-platform — Ready-to-use binaries for Windows, macOS, and Linux; also supports Docker, Colab, RunPod, and Android via Termux.
Community Discussions
Be the first to start a conversation about KoboldCpp
Share your experience with KoboldCpp, ask questions, or help others learn from your insights.
Pricing
Open Source (Free)
Fully free and open-source under AGPL-3.0. Download and use with no cost.
- All features included
- No usage limits
- Self-hosted
- CPU and GPU inference
- Image, video, speech, and music generation
Capabilities
Key Features
- Single-file executable with no installation required
- CPU and GPU inference with full or partial layer offloading
- Supports all GGML and GGUF models with backwards compatibility
- Image generation (SD1.5, SDXL, SD3, Flux, Qwen Image, Z-Image, Klein)
- Video generation (WAN 2.2)
- Speech-to-text via Whisper
- Text-to-speech via Qwen3TTS, Kokoro, OuteTTS, Parler, Dia
- Music generation via Ace Step 1.5
- Multimodal image recognition/vision
- MCP Server support and tool calling
- Multiple API endpoints (KoboldCpp, OpenAI, Ollama, A1111, ComfyUI, Whisper, XTTS)
- Bundled KoboldAI Lite UI with chat, adventure, instruct, storywriter modes
- Tavern Character Card support
- RAG via TextDB
- Web search integration
- Regex support
- New samplers
- Context size extension beyond model defaults
- CUDA, Vulkan, and Metal GPU acceleration
- Docker support
- Google Colab and RunPod support
- Android support via Termux
Integrations
Demo Video

