A low-level C++ tensor library for machine learning with integer quantization, broad hardware support, and zero runtime memory allocations.
At a Glance
Free to use, modify, and distribute under the MIT License.
Engagement
Available On
Listed May 2026
About ggml
ggml is a C++ tensor library for machine learning developed under the ggml-org organization on GitHub. It serves as the foundational engine behind popular projects like llama.cpp and whisper.cpp, providing low-level primitives for running inference on large language models and other ML workloads. Released under the MIT License, it has accumulated over 14,600 stars and 1,600 forks since its creation in September 2022.
What It Is
ggml is a cross-platform tensor computation library written in C++ that enables machine learning inference without third-party dependencies. It is not a high-level framework — it operates at the tensor algebra level, providing the building blocks that higher-level projects like llama.cpp and whisper.cpp use to run models efficiently on consumer hardware. The library is designed to be embedded directly into applications, making it suitable for edge and local inference scenarios.
Core Design Principles
ggml is built around a set of constraints that prioritize portability and efficiency:
- No third-party dependencies — the library compiles standalone without external packages
- Zero memory allocations during runtime — all memory is pre-allocated, avoiding heap fragmentation during inference
- Integer quantization support — enables running large models in reduced precision (e.g., 4-bit, 8-bit) to fit within limited memory budgets
- Automatic differentiation — supports gradient computation for training workflows
- ADAM and L-BFGS optimizers — built-in optimization algorithms for fine-tuning
- Broad hardware support — targets CPUs across architectures, with backend extensions for GPU acceleration
Relationship to llama.cpp and whisper.cpp
The ggml README notes that active development is currently split across the ggml, llama.cpp, and whisper.cpp repositories. ggml acts as the shared tensor backend, while llama.cpp and whisper.cpp build model-specific inference logic on top of it. The GGUF file format — used to package quantized model weights — is documented within the ggml project and has become a widely adopted standard for distributing local inference models.
Build and Setup Path
ggml uses CMake as its build system and requires Python 3.10 for its example scripts. The build process is straightforward:
- Clone the repository
- Set up a Python virtual environment and install requirements
- Run
cmakeandcmake --buildto compile examples
Example binaries such as gpt-2-backend are included to demonstrate inference on models like GPT-2 117M directly from the command line.
Update: v0.12.0
The latest release is v0.12.0, published on May 16, 2026, with the repository last pushed to on May 21, 2026. The project remains under active development, with 322 open issues and ongoing contributions. The GitHub topics associated with the repository — automatic-differentiation, large-language-models, machine-learning, and tensor-algebra — reflect its continued focus on foundational ML infrastructure rather than end-user tooling.
Community Discussions
Be the first to start a conversation about ggml
Share your experience with ggml, ask questions, or help others learn from your insights.
Pricing
Open Source
Free to use, modify, and distribute under the MIT License.
- Full source code access
- MIT License
- No runtime memory allocations
- Integer quantization support
- Automatic differentiation
Capabilities
Key Features
- Low-level cross-platform tensor operations
- Integer quantization support (4-bit, 8-bit)
- Broad hardware support
- Automatic differentiation
- ADAM and L-BFGS optimizers
- No third-party dependencies
- Zero memory allocations during runtime
- GGUF file format support
- C++ implementation
