llamafile

Name: llamafile
Availability: OnlineOnly
Author: Mozilla AI

llamafile lets you distribute and run LLMs with a single self-contained executable file, with no installation required, across most operating systems and CPU architectures.

Visit Website

At a Glance

Pricing

Open Source

Fully free and open-source under Apache 2.0. Download and run LLMs locally with no cost.

Engagement

Available On

Windows

macOS

Linux

API

CLI

Mozilla AISan Francisco, CAEst. 2023$30M raised

Listed Apr 2026

About llamafile

llamafile is a Mozilla Builders project that collapses the complexity of running large language models into a single-file executable. It combines llama.cpp with Cosmopolitan Libc so that one file runs locally on most operating systems and CPU architectures without any installation. The project also includes whisperfile, a single-file speech-to-text tool built on whisper.cpp using the same packaging approach. llamafile is fully open source under the Apache 2.0 license and is actively maintained by Mozilla.ai.

Single-file distribution — Download one .llamafile executable and run it directly; no Python environment, Docker, or package manager needed.
Cross-platform support — The same file runs on macOS, Linux, Windows, BSD, and multiple CPU architectures thanks to Cosmopolitan Libc.
Built on llama.cpp — Inherits broad model compatibility and GPU acceleration support from the widely-used llama.cpp inference engine.
whisperfile included — A companion single-file speech-to-text tool built on whisper.cpp for audio transcription and translation, requiring no installation.
Local inference — All computation runs on your own hardware; no data is sent to external servers.
Pre-built model files — Ready-to-run llamafiles for popular models (e.g., Qwen, LLaVA) are hosted on Hugging Face for immediate download.
Quick start — Download a .llamafile, mark it executable (chmod +x), and run it; Windows users rename with .exe extension.
Versioned releases — Stable and legacy releases are available on GitHub; pre-built llamafiles indicate which server version they bundle.
Open source — Apache 2.0 licensed core; llama.cpp and whisper.cpp modifications are MIT licensed for upstream compatibility.

Community Discussions

Be the first to start a conversation about llamafile

Share your experience with llamafile, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Fully free and open-source under Apache 2.0. Download and run LLMs locally with no cost.

Single-file LLM execution
Cross-platform support
whisperfile speech-to-text
Local inference
No installation required

Capabilities

Key Features

Single-file LLM executable
No installation required
Cross-platform (Windows, macOS, Linux, BSD)
Multi-architecture CPU support
Built on llama.cpp
whisperfile speech-to-text tool
Local inference
Pre-built model files on Hugging Face
GPU acceleration support
Open source (Apache 2.0)

Integrations

llama.cpp

whisper.cpp

Cosmopolitan Libc

Hugging Face

API Available

View Docs

Back to all tools Suggest an edit

About llamafile

Single-file distribution — Download one .llamafile executable and run it directly; no Python environment, Docker, or package manager needed.
Cross-platform support — The same file runs on macOS, Linux, Windows, BSD, and multiple CPU architectures thanks to Cosmopolitan Libc.
Built on llama.cpp — Inherits broad model compatibility and GPU acceleration support from the widely-used llama.cpp inference engine.
whisperfile included — A companion single-file speech-to-text tool built on whisper.cpp for audio transcription and translation, requiring no installation.
Local inference — All computation runs on your own hardware; no data is sent to external servers.
Pre-built model files — Ready-to-run llamafiles for popular models (e.g., Qwen, LLaVA) are hosted on Hugging Face for immediate download.
Quick start — Download a .llamafile, mark it executable (chmod +x), and run it; Windows users rename with .exe extension.
Versioned releases — Stable and legacy releases are available on GitHub; pre-built llamafiles indicate which server version they bundle.
Open source — Apache 2.0 licensed core; llama.cpp and whisper.cpp modifications are MIT licensed for upstream compatibility.

llamafile