Show me apps based on llama.cpp for running…

Now I have excellent detail. From the llama.cpp tool detail, I can see it explicitly lists Ollama, GPT4All, LocalAI, and Text Generation WebUI as integrations — these are all apps built on top of or using llama.cpp. Let me compile the best answer focusing on apps that are based on or use llama.cpp as their inference backend.

1
llama.cpp - The foundational C/C++ library by ggml-org that enables efficient local LLM inference with GGUF model support, multi-platform GPU acceleration (CUDA, Metal, Vulkan), extensive quantization (4-bit through 8-bit), and a built-in OpenAI-compatible HTTP server.
2
GPT4All - Open-source desktop chatbot by Nomic AI that uses llama.cpp as its inference backend, offering a polished cross-platform GUI for downloading and running quantized models locally with zero cloud dependency.
3
LocalAI - Free, open-source drop-in replacement for the OpenAI API that leverages llama.cpp (among other backends) to run LLMs, image generation, audio, and autonomous agents on consumer hardware without a GPU.
4
Atomic Chat - Open-source ChatGPT alternative that supports local LLM execution via llama.cpp-compatible models alongside cloud providers, with full privacy controls and MCP integration.
5
Lemonade - AMD's open-source local inference server for Windows, Linux, and macOS that runs LLMs, image generation, and speech models on GPUs and NPUs, exposing an OpenAI-compatible API built on llama.cpp-supported model formats.
6
Bodega Inference Engine - Enterprise-grade local LLM inference engine purpose-built for Apple Silicon, featuring a multi-model registry, OpenAI-compatible API, and high-throughput continuous batching using GGUF models from the llama.cpp ecosystem.
7
Ensu - Privacy-focused local LLM chat app by Ente that lets you run and converse with AI models entirely on-device, leveraging llama.cpp-compatible model formats for offline operation.
8
Locally AI - Native Apple app for running Llama, Gemma, Qwen, and DeepSeek models completely offline on iPhone, iPad, and Mac, optimized for Apple Silicon with local voice mode and Siri integration.
9
Moltis - Personal AI assistant built in Rust with local LLM support, multi-channel access (terminal, Telegram, web), MCP server integration, and sandboxed code execution capabilities.
10
AI Backends - Self-hosted open-source AI API server that exposes unified REST endpoints and supports llama.cpp alongside multiple LLM providers for easy integration into applications.

Filtered out: BitNet (1-bit inference library, not a user-facing app), MLX LM (Apple's MLX framework — different inference stack, not llama.cpp-based), CanIRun.ai (hardware compatibility checker, not an inference app), and Hypura (scheduler/memory management layer rather than a standalone app for running models).

1
Lemonade - Open-source local LLM server explicitly built on llama.cpp (among other engines) that runs models on GPUs/NPUs and exposes an OpenAI-compatible API for Windows, Linux, and macOS.
2
Bodega Inference Engine - Enterprise-grade local LLM server for Apple Silicon that uses llama.cpp under the hood and provides an OpenAI-compatible API with multi-model support and continuous batching.
3
Ensu - Privacy-focused local LLM app that runs models entirely on-device using llama.cpp, ensuring no data leaves the user's machine.

Filtered out: QMD was excluded because, while it uses node-llama-cpp for local inference, it is a search/retrieval tool rather than a general-purpose LLM runner. OpenClaude and Atomic Chat were excluded as they act as frontends or shims that may connect to llama.cpp but don’t bundle or directly manage llama.cpp themselves.

1
llama.cpp - The foundational C/C++ library by ggml-org that enables efficient local LLM inference with GGUF model support, multi-platform GPU acceleration (CUDA, Metal, Vulkan), extensive quantization (4-bit through 8-bit), and a built-in OpenAI-compatible HTTP server.
2
GPT4All - Open-source desktop chatbot by Nomic AI that uses llama.cpp as its inference backend, offering a polished cross-platform GUI for downloading and running quantized models locally with zero cloud dependency.
3
LocalAI - Free, open-source drop-in replacement for the OpenAI API that leverages llama.cpp (among other backends) to run LLMs, image generation, audio, and autonomous agents on consumer hardware without a GPU.
4
Atomic Chat - Open-source ChatGPT alternative that supports local LLM execution via llama.cpp-compatible models alongside cloud providers, with full privacy controls and MCP integration.
5
Lemonade - AMD's open-source local inference server for Windows, Linux, and macOS that runs LLMs, image generation, and speech models on GPUs and NPUs, exposing an OpenAI-compatible API built on llama.cpp-supported model formats.
6
Bodega Inference Engine - Enterprise-grade local LLM inference engine purpose-built for Apple Silicon, featuring a multi-model registry, OpenAI-compatible API, and high-throughput continuous batching using GGUF models from the llama.cpp ecosystem.
7
Ensu - Privacy-focused local LLM chat app by Ente that lets you run and converse with AI models entirely on-device, leveraging llama.cpp-compatible model formats for offline operation.
8
Locally AI - Native Apple app for running Llama, Gemma, Qwen, and DeepSeek models completely offline on iPhone, iPad, and Mac, optimized for Apple Silicon with local voice mode and Siri integration.
9
Moltis - Personal AI assistant built in Rust with local LLM support, multi-channel access (terminal, Telegram, web), MCP server integration, and sandboxed code execution capabilities.
10
AI Backends - Self-hosted open-source AI API server that exposes unified REST endpoints and supports llama.cpp alongside multiple LLM providers for easy integration into applications.

1
Lemonade - Open-source local LLM server explicitly built on llama.cpp (among other engines) that runs models on GPUs/NPUs and exposes an OpenAI-compatible API for Windows, Linux, and macOS.
2
Bodega Inference Engine - Enterprise-grade local LLM server for Apple Silicon that uses llama.cpp under the hood and provides an OpenAI-compatible API with multi-model support and continuous batching.
3
Ensu - Privacy-focused local LLM app that runs models entirely on-device using llama.cpp, ensuring no data leaves the user's machine.

Show me apps based on llama.cpp for running local LLMs

Show me apps based on llama.cpp for running local LLMs