Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • News
  • Blogs
  • Builds
  • Contests
  • Compare
Create
    EveryDev.ai
    Sign inSubscribe
    Home
    Tools

    1,958+ AI tools

    • New
    • Trending
    • Featured
    • Compare
    Categories
    • Agents1038
    • Coding971
    • Infrastructure415
    • Marketing398
    • Design335
    • Projects313
    • Analytics299
    • Research290
    • Testing183
    • Integration167
    • Data163
    • Security156
    • MCP145
    • Learning135
    • Communication120
    • Extensions114
    • Prompts110
    • Commerce106
    • Voice102
    • DevOps84
    • Web71
    • Finance18
    1. Home
    2. Tools
    3. MiniCPM
    MiniCPM icon

    MiniCPM

    Local Inference

    MiniCPM is a series of ultra-efficient open-source large language models designed for end-side devices, featuring sparse attention, hybrid reasoning, and 3x+ generation speedup.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Fully free and open-source under Apache License 2.0. All models and code are free to use, modify, and distribute.

    Engagement

    Available On

    API
    CLI
    SDK

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    Local InferenceModel ManagementAI Development Libraries

    Alternatives

    Axolotlflash-moeGuppyLM
    Developer
    OpenBMBBeijing, ChinaEst. 2022$200M raised

    Listed Apr 2026

    About MiniCPM

    MiniCPM is an open-source family of highly efficient large language models (LLMs) developed by OpenBMB (THUNLP and Modelbest Inc.), designed explicitly for deployment on end-side and edge devices. The series achieves state-of-the-art performance at its scale through systematic innovations in model architecture, training data, training algorithms, and inference systems. The latest models—MiniCPM4, MiniCPM4.1, and MiniCPM-SALA—deliver over 3–7x generation speedup compared to similar-sized models on edge hardware, while supporting context lengths up to 1 million tokens.

    Key Features:

    • Efficient Model Architecture — MiniCPM4 and MiniCPM4.1 use InfLLM-V2 trainable sparse attention, where each token computes relevance with less than 5% of tokens in 128K long-text processing, drastically reducing computational overhead.
    • MiniCPM-SALA Hybrid Attention — The first large-scale hybrid model integrating 25% sparse attention (InfLLM-V2) and 75% linear attention (Lightning Attention), enabling 1M-token inference on consumer GPUs like the NVIDIA RTX 5090.
    • Hybrid Reasoning Mode — MiniCPM4.1 supports both deep reasoning and non-reasoning modes, toggled via enable_thinking in the chat template or inline /think//no_think tokens.
    • BitCPM4 Ternary Quantization — Compresses model parameters to 1.58-bit width via quantization-aware training (QAT), achieving comparable performance to full-precision models at a fraction of the size.
    • Multiple Inference Backends — Supports HuggingFace Transformers, vLLM, SGLang, CPM.cu (recommended for maximum speed), llama.cpp, and Ollama for flexible deployment.
    • Speculative Decoding (EAGLE3) — Achieves up to 3x decoding speedup in reasoning mode using the EAGLE3 draft model with vLLM and SGLang.
    • MiniCPM4-MCP Tool Use — Fine-tuned variant supporting tool calling across 16 MCP servers spanning office, lifestyle, communication, and work management categories.
    • MiniCPM4-Survey Agent — Specialized model for trustworthy long-form survey generation using a Plan-Retrieve-Write multi-agent framework with RL training.
    • Long Context Support — MiniCPM4.1 natively supports 64K tokens with YaRN-based extension to 128K+; MiniCPM-SALA scales to 1M+ tokens via HyPE positional encoding.
    • Apache 2.0 License — All models and code are released under the Apache License 2.0, free to use, modify, and distribute.
    MiniCPM - 1

    Community Discussions

    Be the first to start a conversation about MiniCPM

    Share your experience with MiniCPM, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source

    Fully free and open-source under Apache License 2.0. All models and code are free to use, modify, and distribute.

    • All MiniCPM model weights (MiniCPM4, MiniCPM4.1, MiniCPM-SALA, BitCPM4, etc.)
    • Apache License 2.0
    • HuggingFace and ModelScope model downloads
    • Full source code access
    • Community support via Discord and WeChat

    Capabilities

    Key Features

    • Trainable sparse attention (InfLLM-V2)
    • Hybrid sparse + linear attention (MiniCPM-SALA)
    • 1M-token context on consumer GPUs
    • Hybrid reasoning mode (deep reasoning / non-reasoning)
    • BitCPM4 ternary quantization (1.58-bit)
    • EAGLE3 speculative decoding (3x speedup)
    • MCP tool use across 16 servers
    • Survey generation agent (MiniCPM4-Survey)
    • HuggingFace, vLLM, SGLang, CPM.cu, llama.cpp, Ollama support
    • YaRN long context extension
    • GPTQ, AWQ, GGUF, MLX quantized variants
    • Apache 2.0 open-source license

    Integrations

    HuggingFace Transformers
    vLLM
    SGLang
    CPM.cu
    llama.cpp
    Ollama
    ModelScope
    OpenVINO
    Intel Core Ultra (AIPC)
    MCP (Model Context Protocol)
    NVIDIA CUDA
    API Available
    View Docs

    Demo Video

    MiniCPM Demo Video
    Watch on YouTube

    Reviews & Ratings

    No ratings yet

    Be the first to rate MiniCPM and help others make informed decisions.

    Developer

    OpenBMB

    Founded 2022
    Beijing, China
    $200M raised
    150 employees

    Used by

    China Telecom
    Xiaomi
    Zhihu
    Qualcomm
    Read more about OpenBMB
    Website
    2 tools in directory

    Similar Tools

    Axolotl icon

    Axolotl

    Open-source tool for fine-tuning LLMs faster and at scale, supporting multi-GPU training, LoRA, FSDP, and a wide range of model architectures.

    flash-moe icon

    flash-moe

    A Mixture of Experts (MoE) implementation in Python, enabling efficient sparse model inference by routing inputs to specialized expert sub-networks.

    GuppyLM icon

    GuppyLM

    A ~9M parameter tiny language model trained from scratch that roleplays as a fish named Guppy, designed as an educational project to demystify LLM training.

    Browse all tools

    Related Topics

    Local Inference

    Tools and platforms for running AI inference locally without cloud dependence.

    68 tools

    Model Management

    Tools for managing, versioning, and deploying AI models.

    24 tools

    AI Development Libraries

    Programming libraries and frameworks that provide machine learning capabilities, model integration, and AI functionality for developers.

    133 tools
    Browse all topics
    Back to all tools
    Explore AI Tools
    • AI Coding Assistants
    • Agent Frameworks
    • MCP Servers
    • AI Prompt Tools
    • Vibe Coding Tools
    • AI Design Tools
    • AI Database Tools
    • AI Website Builders
    • AI Testing Tools
    • LLM Evaluations
    Follow Us
    • X / Twitter
    • LinkedIn
    • Reddit
    • Discord
    • Threads
    • Bluesky
    • Mastodon
    • YouTube
    • GitHub
    • Instagram
    Get Started
    • About
    • Editorial Standards
    • Corrections & Disclosures
    • Community Guidelines
    • Advertise
    • Contact Us
    • Newsletter
    • Submit a Tool
    • Start a Discussion
    • Write A Blog
    • Share A Build
    • Terms of Service
    • Privacy Policy
    Explore with AI
    • ChatGPT
    • Gemini
    • Claude
    • Grok
    • Perplexity
    Agent Experience
    • llms.txt
    Theme
    With AI, Everyone is a Dev. EveryDev.ai © 2026
    Discussions