llamafile
llamafile lets you distribute and run LLMs with a single self-contained executable file, with no installation required, across most operating systems and CPU architectures.
At a Glance
About llamafile
llamafile is a Mozilla Builders project that collapses the complexity of running large language models into a single-file executable. It combines llama.cpp with Cosmopolitan Libc so that one file runs locally on most operating systems and CPU architectures without any installation. The project also includes whisperfile, a single-file speech-to-text tool built on whisper.cpp using the same packaging approach. llamafile is fully open source under the Apache 2.0 license and is actively maintained by Mozilla.ai.
- Single-file distribution — Download one
.llamafileexecutable and run it directly; no Python environment, Docker, or package manager needed. - Cross-platform support — The same file runs on macOS, Linux, Windows, BSD, and multiple CPU architectures thanks to Cosmopolitan Libc.
- Built on llama.cpp — Inherits broad model compatibility and GPU acceleration support from the widely-used llama.cpp inference engine.
- whisperfile included — A companion single-file speech-to-text tool built on whisper.cpp for audio transcription and translation, requiring no installation.
- Local inference — All computation runs on your own hardware; no data is sent to external servers.
- Pre-built model files — Ready-to-run llamafiles for popular models (e.g., Qwen, LLaVA) are hosted on Hugging Face for immediate download.
- Quick start — Download a
.llamafile, mark it executable (chmod +x), and run it; Windows users rename with.exeextension. - Versioned releases — Stable and legacy releases are available on GitHub; pre-built llamafiles indicate which server version they bundle.
- Open source — Apache 2.0 licensed core; llama.cpp and whisper.cpp modifications are MIT licensed for upstream compatibility.
Community Discussions
Be the first to start a conversation about llamafile
Share your experience with llamafile, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully free and open-source under Apache 2.0. Download and run LLMs locally with no cost.
- Single-file LLM execution
- Cross-platform support
- whisperfile speech-to-text
- Local inference
- No installation required
Capabilities
Key Features
- Single-file LLM executable
- No installation required
- Cross-platform (Windows, macOS, Linux, BSD)
- Multi-architecture CPU support
- Built on llama.cpp
- whisperfile speech-to-text tool
- Local inference
- Pre-built model files on Hugging Face
- GPU acceleration support
- Open source (Apache 2.0)
