EveryDev.ai
Subscribe
Home
Tools

3,020+ AI tools

  • New
  • Trending
  • Featured
  • Compare
  • Arena
Categories
  • Agents2145
  • Coding1511
  • Infrastructure681
  • Marketing532
  • Projects485
  • Research447
  • Design413
  • Analytics378
  • MCP278
  • Security271
  • Testing264
  • Data256
  • Integration188
  • Prompts185
  • Communication176
  • Learning170
  • Extensions169
  • Voice150
  • Commerce134
  • DevOps115
  • Web86
  • Finance26
AI Tools by Topic
  • AI Coding Assistants
  • Agent Frameworks
  • MCP Servers
  • AI Prompt Tools
  • Vibe Coding Tools
  • AI Design Tools
  • AI Database Tools
  • AI Website Builders
  • AI Testing Tools
  • LLM Evaluations
Follow Us
  • X / Twitter
  • LinkedIn
  • Reddit
  • Discord
  • Threads
  • Bluesky
  • Mastodon
  • YouTube
  • GitHub
  • Instagram
Get Started
  • About
  • Editorial Standards
  • Corrections & Disclosures
  • Community Guidelines
  • Advertise
  • Contact Us
  • Newsletter
  • Submit a Tool
  • Start a Discussion
  • Write A Blog
  • Share A Build
  • Terms of Service
  • Privacy Policy
Explore with AI
  • ChatGPT
  • Gemini
  • Claude
  • Grok
  • Perplexity
Agent Experience
  • llms.txt
Theme
With AI, Everyone is a Dev. EveryDev.ai © 2026
    1. Home
    2. Tools
    3. llama.cpp
    llama.cpp icon

    llama.cpp

    Local Inference
    Featured

    LLM inference in C/C++ enabling efficient local execution of large language models across various hardware platforms.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Free and open source under MIT license

    Engagement

    Available On

    Windows
    macOS
    Linux
    Web
    API

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    Local InferenceAI Development LibrariesAI Infrastructure

    Alternatives

    vLLMtiny-vllmparakeet.cpp
    Developer
    ggml-orgggml-org develops high-performance machine learning inferenc…

    Listed Feb 2026

    About llama.cpp

    llama.cpp is a high-performance C/C++ library for running large language model (LLM) inference locally on a wide variety of hardware. Originally developed to enable running Meta's LLaMA models on consumer hardware, it has evolved into a comprehensive framework supporting numerous model architectures and quantization formats. The project prioritizes efficiency, portability, and minimal dependencies, making it ideal for developers who want to deploy LLMs without relying on cloud services.

    • Pure C/C++ Implementation provides a lightweight, dependency-free codebase that compiles easily across platforms without requiring heavy frameworks like PyTorch or TensorFlow.

    • Extensive Quantization Support enables running large models on limited hardware through various quantization methods (4-bit, 5-bit, 8-bit), dramatically reducing memory requirements while maintaining reasonable quality.

    • Multi-Platform Hardware Acceleration supports CUDA, Metal, OpenCL, Vulkan, and CPU-optimized SIMD instructions, allowing optimal performance on NVIDIA GPUs, Apple Silicon, AMD GPUs, and modern CPUs.

    • Model Format Compatibility works with GGUF format and supports conversion from various model formats, enabling use of models from Hugging Face and other sources.

    • Server Mode includes a built-in HTTP server with OpenAI-compatible API endpoints, making it easy to integrate into existing applications and workflows.

    • Active Community Development benefits from rapid iteration and contributions from a large open-source community, with frequent updates adding support for new models and optimizations.

    To get started, clone the repository, build using CMake with your preferred backend (CPU, CUDA, Metal, etc.), download a GGUF-format model, and run inference using the provided command-line tools or server. The project includes comprehensive documentation covering build options, model conversion, and API usage.

    llama.cpp - 1

    Community Discussions

    Be the first to start a conversation about llama.cpp

    Share your experience with llama.cpp, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source

    Free and open source under MIT license

    • Full source code access
    • All features included
    • Community support
    • MIT License

    Capabilities

    Key Features

    • Pure C/C++ implementation with no dependencies
    • 4-bit, 5-bit, and 8-bit quantization support
    • CUDA GPU acceleration for NVIDIA GPUs
    • Metal acceleration for Apple Silicon
    • Vulkan and OpenCL support
    • CPU SIMD optimizations (AVX, AVX2, AVX512)
    • GGUF model format support
    • Built-in HTTP server with OpenAI-compatible API
    • Model conversion tools
    • Batch processing support
    • KV cache quantization
    • Speculative decoding
    • Grammar-based sampling
    • Multi-modal model support
    • Cross-platform compatibility

    Integrations

    Hugging Face Models
    OpenAI API compatible clients
    LangChain
    LlamaIndex
    Ollama
    Text Generation WebUI
    LocalAI
    API Available
    View Docs

    Ratings & Reviews

    No ratings yet

    Be the first to rate llama.cpp and help others make informed decisions.

    Developer

    ggml-org

    ggml-org develops high-performance machine learning inference libraries in C/C++. The organization maintains llama.cpp, one of the most popular open-source projects for running large language models locally. The team focuses on creating efficient, portable implementations that enable AI inference across diverse hardware platforms without heavy dependencies.

    Read more about ggml-org
    WebsiteGitHub
    2 tools in directory

    Similar Tools

    vLLM icon

    vLLM

    An open-source, high-performance library for serving and running large language models with GPU-optimized inference and efficient memory and batch management.

    tiny-vllm icon

    tiny-vllm

    A hands-on course and full source code for building a high-performance LLM inference engine in C++ and CUDA, implementing features like KV cache, PagedAttention, and continuous batching.

    parakeet.cpp icon

    parakeet.cpp

    parakeet.cpp is a lightweight C++ implementation for running Parakeet speech recognition models locally with fast, offline transcription capabilities.

    Browse all tools

    Related Topics

    Local Inference

    Tools and platforms for running AI inference locally without cloud dependence.

    132 tools

    AI Development Libraries

    Programming libraries and frameworks that provide machine learning capabilities, model integration, and AI functionality for developers.

    232 tools

    AI Infrastructure

    Infrastructure designed for deploying and running AI models.

    293 tools
    Browse all topics
    Back to all toolsSuggest an edit
    ratings
    discussions
    42views