# KoboldCpp

> KoboldCpp is a single-file, easy-to-use AI text generation tool for GGML and GGUF models, supporting CPU/GPU inference, image generation, speech, and more.

KoboldCpp is a self-contained, easy-to-use AI text-generation software built on top of llama.cpp and inspired by the original KoboldAI. It ships as a single executable with no installation required and no external dependencies, supporting a wide range of GGML and GGUF models. Beyond text generation, it integrates image generation, video generation, speech-to-text, text-to-speech, music generation, and multimodal vision into one package. It runs on Windows, macOS, Linux, Android (via Termux), and cloud environments like Google Colab and RunPod.

- **Single-file executable** — *Download and run with no installation or external dependencies on Windows, macOS, or Linux.*
- **CPU and GPU support** — *Runs on CPU or GPU with full or partial layer offloading; supports CUDA (Nvidia), Vulkan (any GPU), and Metal (Apple Silicon).*
- **LLM text generation** — *Supports all GGML and GGUF models with full backwards compatibility, including Llama, Mistral, Qwen, Gemma, Falcon, and hundreds more.*
- **Image generation and editing** — *Built-in Stable Diffusion support (SD1.5, SDXL, SD3, Flux, and more) with an A1111-compatible API.*
- **Video generation** — *Supports WAN 2.2 for AI video generation.*
- **Speech-to-text** — *Voice recognition via Whisper integration.*
- **Text-to-speech** — *Voice generation via Qwen3TTS, Kokoro, OuteTTS, Parler, and Dia.*
- **Music generation** — *Supports Ace Step 1.5 for AI music creation.*
- **Multimodal vision** — *Image recognition and vision capabilities for supported models.*
- **MCP Server support** — *Includes MCP server support and tool calling for agentic workflows.*
- **Multiple compatible APIs** — *Provides KoboldCpp, OpenAI, Ollama, A1111/Forge, ComfyUI, Whisper, XTTS, and OpenAI Speech API endpoints.*
- **Bundled KoboldAI Lite UI** — *Includes a full web UI with chat, adventure, instruct, and storywriter modes, character cards, world info, memory, and more.*
- **RAG and web search** — *Supports retrieval-augmented generation via TextDB and integrated web search.*
- **Cross-platform** — *Ready-to-use binaries for Windows, macOS, and Linux; also supports Docker, Colab, RunPod, and Android via Termux.*

## Features
- Single-file executable with no installation required
- CPU and GPU inference with full or partial layer offloading
- Supports all GGML and GGUF models with backwards compatibility
- Image generation (SD1.5, SDXL, SD3, Flux, Qwen Image, Z-Image, Klein)
- Video generation (WAN 2.2)
- Speech-to-text via Whisper
- Text-to-speech via Qwen3TTS, Kokoro, OuteTTS, Parler, Dia
- Music generation via Ace Step 1.5
- Multimodal image recognition/vision
- MCP Server support and tool calling
- Multiple API endpoints (KoboldCpp, OpenAI, Ollama, A1111, ComfyUI, Whisper, XTTS)
- Bundled KoboldAI Lite UI with chat, adventure, instruct, storywriter modes
- Tavern Character Card support
- RAG via TextDB
- Web search integration
- Regex support
- New samplers
- Context size extension beyond model defaults
- CUDA, Vulkan, and Metal GPU acceleration
- Docker support
- Google Colab and RunPod support
- Android support via Termux

## Integrations
llama.cpp, stable-diffusion.cpp, Whisper, Hugging Face, Google Colab, RunPod, Docker, ComfyUI, Automatic1111/Forge, Ollama, OpenAI API, KoboldAI Lite, Termux

## Platforms
WINDOWS, MACOS, LINUX, ANDROID, API, CLI

## Pricing
Open Source

## Links
- Website: https://github.com/LostRuins/koboldcpp
- Documentation: https://github.com/LostRuins/koboldcpp/wiki
- Repository: https://github.com/LostRuins/koboldcpp
- EveryDev.ai: https://www.everydev.ai/tools/koboldcpp