- 1
llama.cpp - The foundational open-source C/C++ inference engine with Metal acceleration for Apple Silicon, supporting dozens of model architectures and quantization formats so you can run even large models on limited RAM. - 2
MLX LM - Apple's own open-source Python library built on the MLX framework, purpose-built for M-series chips' unified memory architecture with support for inference, LoRA fine-tuning, and Hugging Face model integration. - 3
GPT4All - A beginner-friendly, cross-platform desktop app by Nomic AI that lets you download and chat with curated open-source models in a polished GUI — no terminal skills required. - 4
Locally AI - A native Apple app (iPhone, iPad, Mac) optimized for Apple Silicon via MLX, offering fully offline chat, local voice mode, and Siri/Shortcuts integration for a seamless macOS experience. - 5
apfel - A trending open-source CLI tool that exposes Apple's built-in on-device LLM (macOS 26+) as a terminal command and OpenAI-compatible local server, requiring zero setup beyond a Homebrew install. - 6
Open WebUI - A self-hosted web interface that connects to local backends like Ollama or llama.cpp, giving you a ChatGPT-style UI for any local model with RAG, voice, and Python extensibility. - 7
Bodega Inference Engine - An enterprise-grade inference server built specifically for Apple Silicon, supporting multi-model concurrency, continuous batching (~900 tok/s on M4 Max), and an OpenAI-compatible API. - 8
Hypura - A storage-tier-aware scheduler for Apple Silicon that intelligently places model tensors across GPU, RAM, and NVMe, letting you run models that are too large to fit in memory without swap-thrashing. - 9
Atomic Chat - An open-source ChatGPT alternative with a clean UI that supports both local LLMs and cloud models, plus MCP integration for extending agent capabilities. - 10
CanIRun.ai - A free web tool that analyzes your Mac's GPU, VRAM, and memory bandwidth to tell you exactly which AI models your hardware can actually run before you download anything.
Show me the top tools to get a local LLM running on my Mac
- 1
apfel - Free, open-source CLI tool that uses macOS 26+'s built-in on-device LLM via Apple Neural Engine with an OpenAI-compatible HTTP server and zero API keys required. - 2
Hypura - Storage-tier-aware LLM inference scheduler for Apple Silicon that intelligently distributes models across GPU, RAM, and NVMe to run models larger than your Mac's memory. - 3
MLX LM - Apple-official Python library for running and fine-tuning LLMs on Apple Silicon with quantization, LoRA support, and Hugging Face integration. - 4
GPT4All - Free, open-source desktop application with a user-friendly interface for downloading and running local LLMs privately on macOS without internet. - 5
Ensu - Lightweight free app by Ente for running and chatting with local LLMs entirely on-device with full privacy. - 6
AnythingLLM - All-in-one AI app supporting local LLM inference with RAG, document chat, multi-user access, and agent workflows. - 7
Lemonade - Open-source local LLM server by AMD supporting macOS with GPU/NPU acceleration and OpenAI-compatible API for LLMs, image generation, and speech.
- 8
Atomic Chat - Open-source ChatGPT alternative letting you run local LLMs with full privacy control and optional MCP integration.
Have a tool question of your own? Describe what you need in plain English and let two models search our database for you.