ggml

Name: ggml
Availability: OnlineOnly
Author: ggml-org

A low-level C++ tensor library for machine learning with integer quantization, broad hardware support, and zero runtime memory allocations.

Visit Website

At a Glance

Pricing

Open Source

Free to use, modify, and distribute under the MIT License.

Engagement

Available On

CLI

API

SDK

ggml-orgggml-org develops high-performance machine learning inferenc…

Listed May 2026

About ggml

ggml is a C++ tensor library for machine learning developed under the ggml-org organization on GitHub. It serves as the foundational engine behind popular projects like llama.cpp and whisper.cpp, providing low-level primitives for running inference on large language models and other ML workloads. Released under the MIT License, it has accumulated over 14,600 stars and 1,600 forks since its creation in September 2022.

What It Is

ggml is a cross-platform tensor computation library written in C++ that enables machine learning inference without third-party dependencies. It is not a high-level framework — it operates at the tensor algebra level, providing the building blocks that higher-level projects like llama.cpp and whisper.cpp use to run models efficiently on consumer hardware. The library is designed to be embedded directly into applications, making it suitable for edge and local inference scenarios.

Core Design Principles

ggml is built around a set of constraints that prioritize portability and efficiency:

No third-party dependencies — the library compiles standalone without external packages
Zero memory allocations during runtime — all memory is pre-allocated, avoiding heap fragmentation during inference
Integer quantization support — enables running large models in reduced precision (e.g., 4-bit, 8-bit) to fit within limited memory budgets
Automatic differentiation — supports gradient computation for training workflows
ADAM and L-BFGS optimizers — built-in optimization algorithms for fine-tuning
Broad hardware support — targets CPUs across architectures, with backend extensions for GPU acceleration

Relationship to llama.cpp and whisper.cpp

The ggml README notes that active development is currently split across the ggml, llama.cpp, and whisper.cpp repositories. ggml acts as the shared tensor backend, while llama.cpp and whisper.cpp build model-specific inference logic on top of it. The GGUF file format — used to package quantized model weights — is documented within the ggml project and has become a widely adopted standard for distributing local inference models.

Build and Setup Path

ggml uses CMake as its build system and requires Python 3.10 for its example scripts. The build process is straightforward:

Clone the repository
Set up a Python virtual environment and install requirements
Run cmake and cmake --build to compile examples

Example binaries such as gpt-2-backend are included to demonstrate inference on models like GPT-2 117M directly from the command line.

Update: v0.12.0

The latest release is v0.12.0, published on May 16, 2026, with the repository last pushed to on May 21, 2026. The project remains under active development, with 322 open issues and ongoing contributions. The GitHub topics associated with the repository — automatic-differentiation, large-language-models, machine-learning, and tensor-algebra — reflect its continued focus on foundational ML infrastructure rather than end-user tooling.

Community Discussions

Be the first to start a conversation about ggml

Share your experience with ggml, ask questions, or help others learn from your insights.

Pricing

OPEN SOURCE

Open Source

Free to use, modify, and distribute under the MIT License.

Full source code access
MIT License
No runtime memory allocations
Integer quantization support
Automatic differentiation

Capabilities

Key Features

Low-level cross-platform tensor operations
Integer quantization support (4-bit, 8-bit)
Broad hardware support
Automatic differentiation
ADAM and L-BFGS optimizers
No third-party dependencies
Zero memory allocations during runtime
GGUF file format support
C++ implementation

Integrations

llama.cpp

whisper.cpp

API Available

View Docs

Back to all tools Suggest an edit