Ollama
Run large language models locally on your machine with a simple CLI and REST API, with optional cloud scaling for larger models.
At a Glance
Get started with Ollama — run models on your own hardware with cloud model access included.
Engagement
Available On
Updated May 2026
About Ollama
Ollama is an open-source tool that lets developers and individuals run large language models (LLMs) directly on their own hardware — macOS, Windows, or Linux — with a single install command. Built primarily in Go and released under the MIT License, it backs its local inference engine with llama.cpp and exposes a REST API and CLI for integrating models into applications. An optional cloud tier extends local usage to datacenter-grade hardware when larger or faster models are needed.
What It Is
Ollama is a local inference runtime that downloads, manages, and serves open models through a unified interface. It sits between the raw model weights and the applications or agents that consume them, handling model lifecycle (pull, run, delete), serving a local HTTP API compatible with many LLM frameworks, and optionally routing requests to Ollama's own cloud infrastructure. The core platform is open source (MIT), while the cloud layer is a hosted commercial service layered on top.
Open-Source Architecture
The GitHub repository (ollama/ollama) is written in Go and uses llama.cpp as its primary inference backend. As of the latest release (v0.24.0, published May 14 2026), the project has accumulated over 171,000 GitHub stars and more than 16,000 forks, making it one of the most-starred local LLM projects on GitHub. The MIT license allows unrestricted use, modification, and redistribution. Models are defined via a Modelfile format, and the REST API follows a straightforward chat/generate pattern compatible with many existing LLM SDKs.
Supported Models and Integrations
Ollama's model library includes Gemma 3, DeepSeek, Qwen, Mistral, Llama 3, GLM, MiniMax, Kimi-K2.5, gpt-oss, and many others. The GitHub README lists over 40,000 community integrations spanning:
- Chat interfaces: Open WebUI, LibreChat, Lobe Chat, AnythingLLM, and others
- Code editors: VS Code (via Continue, Cline, AI Toolkit), Emacs, Sublime Text, Qt Creator
- Frameworks and agents: LangChain, LlamaIndex, Spring AI, crewAI, AutoGPT, Semantic Kernel, Firebase Genkit, Haystack
- Languages: Python, JavaScript, Ruby, Rust, Go, Java, .NET, Swift, PHP, Dart, Elixir, R, Julia, C++
- Coding agents: Claude Code, Codex, Copilot CLI, OpenCode, OpenClaw, Droid
- RAG engines: RAGFlow, R2R, LlamaIndex, and others
- Observability: Langfuse, OpenLIT, MLflow Tracing, Opik, Lunary
The ollama launch command can directly start integrations like OpenClaw, Claude Code, or Codex against a locally running model.
Local-First, Cloud-Optional Deployment
Ollama's design philosophy is "start local, scale with cloud." Running models on local hardware is always unlimited and fully offline-capable. The cloud layer — powered by NVIDIA Cloud Providers (NCPs) — adds access to larger models on datacenter hardware, parallel request handling, and real-time web access. Ollama states that prompt and response data is never logged or trained on, and that NCP partners are contractually required to enforce no-logging and zero data retention policies. Cloud models are hosted primarily in the United States, with additional capacity in Europe and Singapore.
Update: v0.24.0
The latest release is v0.24.0, published May 14, 2026. The repository remains actively maintained with pushes as recent as May 19, 2026. The project direction shows continued expansion of cloud model support alongside the core open-source local runtime, with the homepage prominently featuring the OpenClaw integration as a flagship use case for turning Ollama into a personal AI assistant across WhatsApp, Telegram, Slack, and Discord.
Community Discussions
Be the first to start a conversation about Ollama
Share your experience with Ollama, ask questions, or help others learn from your insights.
Pricing
Free
Get started with Ollama — run models on your own hardware with cloud model access included.
- Automate coding, document analysis, and other tasks with open models
- Keep your data private
- Run models on your hardware
- Access cloud models
- CLI, API, and desktop apps
Pro
Solve harder tasks, faster with more cloud usage and concurrent model access.
- Everything in Free
- Access larger, more powerful cloud models
- Run 3 cloud models at a time
- 50x more cloud usage than Free
- Upload and share private models
Max
For your most demanding work — maximum concurrency and usage.
- Everything in Pro
- Run 10 cloud models at a time
- 5x more usage than Pro
Capabilities
Key Features
- Run LLMs locally on macOS, Windows, and Linux
- REST API for model inference
- CLI for model management and chat
- Modelfile format for custom model configuration
- Cloud scaling for larger models on datacenter hardware
- Parallel cloud model execution
- Real-time web access via cloud models
- 40,000+ community integrations
- Support for Gemma, DeepSeek, Qwen, Llama, Mistral, and more
- Python and JavaScript SDKs
- Docker image support
- Offline-capable local inference
- Private model upload and sharing (Pro+)
- Tool calling support for cloud models
- OpenClaw, Claude Code, Codex, and Copilot CLI launch integration
