KoboldCpp

Name: KoboldCpp
Availability: OnlineOnly
Author: LostRuins (concedo)

KoboldCpp is a single-file, easy-to-use AI text generation tool for GGML and GGUF models, supporting CPU/GPU inference, image generation, speech, and more.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under AGPL-3.0. Download and use with no cost.

Engagement

Available On

Windows

macOS

Linux

Android

API

LostRuins (concedo)LostRuins (GitHub handle: concedo) builds and maintains Kobo…

Listed Apr 2026

About KoboldCpp

KoboldCpp is a self-contained, easy-to-use AI text-generation software built on top of llama.cpp and inspired by the original KoboldAI. It ships as a single executable with no installation required and no external dependencies, supporting a wide range of GGML and GGUF models. Beyond text generation, it integrates image generation, video generation, speech-to-text, text-to-speech, music generation, and multimodal vision into one package. It runs on Windows, macOS, Linux, Android (via Termux), and cloud environments like Google Colab and RunPod.

Single-file executable — Download and run with no installation or external dependencies on Windows, macOS, or Linux.
CPU and GPU support — Runs on CPU or GPU with full or partial layer offloading; supports CUDA (Nvidia), Vulkan (any GPU), and Metal (Apple Silicon).
LLM text generation — Supports all GGML and GGUF models with full backwards compatibility, including Llama, Mistral, Qwen, Gemma, Falcon, and hundreds more.
Image generation and editing — Built-in Stable Diffusion support (SD1.5, SDXL, SD3, Flux, and more) with an A1111-compatible API.
Video generation — Supports WAN 2.2 for AI video generation.
Speech-to-text — Voice recognition via Whisper integration.
Text-to-speech — Voice generation via Qwen3TTS, Kokoro, OuteTTS, Parler, and Dia.
Music generation — Supports Ace Step 1.5 for AI music creation.
Multimodal vision — Image recognition and vision capabilities for supported models.
MCP Server support — Includes MCP server support and tool calling for agentic workflows.
Multiple compatible APIs — Provides KoboldCpp, OpenAI, Ollama, A1111/Forge, ComfyUI, Whisper, XTTS, and OpenAI Speech API endpoints.
Bundled KoboldAI Lite UI — Includes a full web UI with chat, adventure, instruct, and storywriter modes, character cards, world info, memory, and more.
RAG and web search — Supports retrieval-augmented generation via TextDB and integrated web search.
Cross-platform — Ready-to-use binaries for Windows, macOS, and Linux; also supports Docker, Colab, RunPod, and Android via Termux.

Community Discussions

Be the first to start a conversation about KoboldCpp

Share your experience with KoboldCpp, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source (Free)

Fully free and open-source under AGPL-3.0. Download and use with no cost.

All features included
No usage limits
Self-hosted
CPU and GPU inference
Image, video, speech, and music generation

Capabilities

Key Features

Single-file executable with no installation required
CPU and GPU inference with full or partial layer offloading
Supports all GGML and GGUF models with backwards compatibility
Image generation (SD1.5, SDXL, SD3, Flux, Qwen Image, Z-Image, Klein)
Video generation (WAN 2.2)
Speech-to-text via Whisper
Text-to-speech via Qwen3TTS, Kokoro, OuteTTS, Parler, Dia
Music generation via Ace Step 1.5
Multimodal image recognition/vision
MCP Server support and tool calling
Multiple API endpoints (KoboldCpp, OpenAI, Ollama, A1111, ComfyUI, Whisper, XTTS)
Bundled KoboldAI Lite UI with chat, adventure, instruct, storywriter modes
Tavern Character Card support
RAG via TextDB
Web search integration
Regex support
New samplers
Context size extension beyond model defaults
CUDA, Vulkan, and Metal GPU acceleration
Docker support
Google Colab and RunPod support
Android support via Termux

Integrations

llama.cpp

stable-diffusion.cpp

Whisper

Hugging Face

Google Colab

RunPod

Docker

ComfyUI

Automatic1111/Forge

Ollama

OpenAI API

KoboldAI Lite

Termux

API Available

View Docs

Demo Video

Watch on YouTube

Back to all tools

About KoboldCpp

Single-file executable — Download and run with no installation or external dependencies on Windows, macOS, or Linux.
CPU and GPU support — Runs on CPU or GPU with full or partial layer offloading; supports CUDA (Nvidia), Vulkan (any GPU), and Metal (Apple Silicon).
LLM text generation — Supports all GGML and GGUF models with full backwards compatibility, including Llama, Mistral, Qwen, Gemma, Falcon, and hundreds more.
Image generation and editing — Built-in Stable Diffusion support (SD1.5, SDXL, SD3, Flux, and more) with an A1111-compatible API.
Video generation — Supports WAN 2.2 for AI video generation.
Speech-to-text — Voice recognition via Whisper integration.
Text-to-speech — Voice generation via Qwen3TTS, Kokoro, OuteTTS, Parler, and Dia.
Music generation — Supports Ace Step 1.5 for AI music creation.
Multimodal vision — Image recognition and vision capabilities for supported models.
MCP Server support — Includes MCP server support and tool calling for agentic workflows.
Multiple compatible APIs — Provides KoboldCpp, OpenAI, Ollama, A1111/Forge, ComfyUI, Whisper, XTTS, and OpenAI Speech API endpoints.
Bundled KoboldAI Lite UI — Includes a full web UI with chat, adventure, instruct, and storywriter modes, character cards, world info, memory, and more.
RAG and web search — Supports retrieval-augmented generation via TextDB and integrated web search.
Cross-platform — Ready-to-use binaries for Windows, macOS, and Linux; also supports Docker, Colab, RunPod, and Android via Termux.

Community Discussions

Be the first to start a conversation about KoboldCpp

Share your experience with KoboldCpp, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source (Free)

Fully free and open-source under AGPL-3.0. Download and use with no cost.

All features included
No usage limits
Self-hosted
CPU and GPU inference
Image, video, speech, and music generation

Capabilities

Key Features

Single-file executable with no installation required
CPU and GPU inference with full or partial layer offloading
Supports all GGML and GGUF models with backwards compatibility
Image generation (SD1.5, SDXL, SD3, Flux, Qwen Image, Z-Image, Klein)
Video generation (WAN 2.2)
Speech-to-text via Whisper
Text-to-speech via Qwen3TTS, Kokoro, OuteTTS, Parler, Dia
Music generation via Ace Step 1.5
Multimodal image recognition/vision
MCP Server support and tool calling
Multiple API endpoints (KoboldCpp, OpenAI, Ollama, A1111, ComfyUI, Whisper, XTTS)
Bundled KoboldAI Lite UI with chat, adventure, instruct, storywriter modes
Tavern Character Card support
RAG via TextDB
Web search integration
Regex support
New samplers
Context size extension beyond model defaults
CUDA, Vulkan, and Metal GPU acceleration
Docker support
Google Colab and RunPod support
Android support via Termux

Integrations

llama.cpp

stable-diffusion.cpp

Whisper

Hugging Face

Google Colab

RunPod

Docker

ComfyUI

Automatic1111/Forge

Ollama

OpenAI API

KoboldAI Lite

Termux

API Available

View Docs

Demo Video

Watch on YouTube

KoboldCpp

At a Glance

Engagement

Available On

Resources

Topics

Alternatives

About KoboldCpp

Community Discussions

Be the first to start a conversation about KoboldCpp

Pricing

Open Source (Free)

Capabilities

Key Features

Integrations

Demo Video

KoboldCpp

At a Glance

Engagement

Available On

Resources

Topics

Alternatives

About KoboldCpp

Community Discussions

Be the first to start a conversation about KoboldCpp

Pricing

Open Source (Free)

Capabilities

Key Features

Integrations

Demo Video