Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • News
  • Blogs
  • Builds
  • Contests
Create
    EveryDev.ai
    Sign inSubscribe
    Home
    Tools

    1,850+ AI tools

    • New
    • Trending
    • Featured
    • Compare
    Categories
    • Agents891
    • Coding869
    • Infrastructure377
    • Marketing357
    • Design302
    • Research276
    • Projects271
    • Analytics266
    • Testing160
    • Integration157
    • Data150
    • Security131
    • MCP125
    • Learning124
    • Extensions108
    • Communication107
    • Prompts100
    • Voice90
    • Commerce89
    • DevOps70
    • Web66
    • Finance17
    1. Home
    2. Tools
    3. llama.cpp
    llama.cpp icon

    llama.cpp

    Local Inference

    LLM inference in C/C++ enabling efficient local execution of large language models across various hardware platforms.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Free and open source under MIT license

    Engagement

    Available On

    Windows
    macOS
    Linux
    Web
    API

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    Local InferenceAI Development LibrariesAI Infrastructure

    Alternatives

    thrmlModularSGLang
    Developer
    ggml-orgggml-org develops high-performance machine learning inferenc…

    Listed Feb 2026

    About llama.cpp

    llama.cpp is a high-performance C/C++ library for running large language model (LLM) inference locally on a wide variety of hardware. Originally developed to enable running Meta's LLaMA models on consumer hardware, it has evolved into a comprehensive framework supporting numerous model architectures and quantization formats. The project prioritizes efficiency, portability, and minimal dependencies, making it ideal for developers who want to deploy LLMs without relying on cloud services.

    • Pure C/C++ Implementation provides a lightweight, dependency-free codebase that compiles easily across platforms without requiring heavy frameworks like PyTorch or TensorFlow.

    • Extensive Quantization Support enables running large models on limited hardware through various quantization methods (4-bit, 5-bit, 8-bit), dramatically reducing memory requirements while maintaining reasonable quality.

    • Multi-Platform Hardware Acceleration supports CUDA, Metal, OpenCL, Vulkan, and CPU-optimized SIMD instructions, allowing optimal performance on NVIDIA GPUs, Apple Silicon, AMD GPUs, and modern CPUs.

    • Model Format Compatibility works with GGUF format and supports conversion from various model formats, enabling use of models from Hugging Face and other sources.

    • Server Mode includes a built-in HTTP server with OpenAI-compatible API endpoints, making it easy to integrate into existing applications and workflows.

    • Active Community Development benefits from rapid iteration and contributions from a large open-source community, with frequent updates adding support for new models and optimizations.

    To get started, clone the repository, build using CMake with your preferred backend (CPU, CUDA, Metal, etc.), download a GGUF-format model, and run inference using the provided command-line tools or server. The project includes comprehensive documentation covering build options, model conversion, and API usage.

    llama.cpp - 1

    Community Discussions

    Be the first to start a conversation about llama.cpp

    Share your experience with llama.cpp, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source

    Free and open source under MIT license

    • Full source code access
    • All features included
    • Community support
    • MIT License

    Capabilities

    Key Features

    • Pure C/C++ implementation with no dependencies
    • 4-bit, 5-bit, and 8-bit quantization support
    • CUDA GPU acceleration for NVIDIA GPUs
    • Metal acceleration for Apple Silicon
    • Vulkan and OpenCL support
    • CPU SIMD optimizations (AVX, AVX2, AVX512)
    • GGUF model format support
    • Built-in HTTP server with OpenAI-compatible API
    • Model conversion tools
    • Batch processing support
    • KV cache quantization
    • Speculative decoding
    • Grammar-based sampling
    • Multi-modal model support
    • Cross-platform compatibility

    Integrations

    Hugging Face Models
    OpenAI API compatible clients
    LangChain
    LlamaIndex
    Ollama
    Text Generation WebUI
    LocalAI
    API Available
    View Docs

    Reviews & Ratings

    No ratings yet

    Be the first to rate llama.cpp and help others make informed decisions.

    Developer

    ggml-org

    ggml-org develops high-performance machine learning inference libraries in C/C++. The organization maintains llama.cpp, one of the most popular open-source projects for running large language models locally. The team focuses on creating efficient, portable implementations that enable AI inference across diverse hardware platforms without heavy dependencies.

    Read more about ggml-org
    WebsiteGitHub
    1 tool in directory

    Similar Tools

    thrml icon

    thrml

    thrml is an open-source library by Extropic AI for thermodynamic computing and probabilistic machine learning.

    Modular icon

    Modular

    AI infrastructure platform with MAX framework, Mojo language, and Mammoth for GPU-portable GenAI serving across NVIDIA and AMD hardware.

    SGLang icon

    SGLang

    Fast serving framework for large language models and vision language models with efficient inference and structured generation.

    Browse all tools

    Related Topics

    Local Inference

    Tools and platforms for running AI inference locally without cloud dependence.

    63 tools

    AI Development Libraries

    Programming libraries and frameworks that provide machine learning capabilities, model integration, and AI functionality for developers.

    127 tools

    AI Infrastructure

    Infrastructure designed for deploying and running AI models.

    174 tools
    Browse all topics
    Back to all tools
    Explore AI Tools
    • AI Coding Assistants
    • Agent Frameworks
    • MCP Servers
    • AI Prompt Tools
    • Vibe Coding Tools
    • AI Design Tools
    • AI Database Tools
    • AI Website Builders
    • AI Testing Tools
    • LLM Evaluations
    Follow Us
    • X / Twitter
    • LinkedIn
    • Reddit
    • Discord
    • Threads
    • Bluesky
    • Mastodon
    • YouTube
    • GitHub
    • Instagram
    Get Started
    • About
    • Editorial Standards
    • Corrections & Disclosures
    • Community Guidelines
    • Advertise
    • Contact Us
    • Newsletter
    • Submit a Tool
    • Start a Discussion
    • Write A Blog
    • Share A Build
    • Terms of Service
    • Privacy Policy
    Explore with AI
    • ChatGPT
    • Gemini
    • Claude
    • Grok
    • Perplexity
    Agent Experience
    • llms.txt
    Theme
    With AI, Everyone is a Dev. EveryDev.ai © 2026
    14views