Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • Communities
  • News
  • Podcasts
  • Blogs
  • Builds
  • Contests
  • Compare
  • Arena
Create
    EveryDev.ai
    Sign inSubscribe
    Home
    Tools

    2,283+ AI tools

    • New
    • Trending
    • Featured
    • Compare
    • Arena
    Categories
    • Agents1228
    • Coding1045
    • Infrastructure455
    • Marketing414
    • Design374
    • Projects340
    • Analytics319
    • Research306
    • Testing200
    • Data171
    • Integration169
    • Security169
    • MCP164
    • Learning146
    • Communication131
    • Prompts122
    • Extensions120
    • Commerce116
    • Voice107
    • DevOps92
    • Web73
    • Finance19
    1. Home
    2. Tools
    3. ds4.c
    ds4.c icon

    ds4.c

    Local Inference

    A small, Metal-native local inference engine specifically built for DeepSeek V4 Flash, featuring disk KV cache persistence, OpenAI/Anthropic-compatible server API, and 2-bit quantization support.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Fully free and open-source under the MIT License. Download, use, modify, and distribute at no cost.

    Engagement

    Available On

    macOS
    CLI
    API

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    Local InferenceAI InfrastructureLLM Orchestration

    Alternatives

    LemonadeBodega Inference EngineRapid-MLX
    Developer
    antirezantirez is the creator of Redis and an independent open-sour…

    Listed May 2026

    About ds4.c

    ds4.c is a deliberately narrow, Metal-native local inference engine built exclusively for DeepSeek V4 Flash. It is not a generic GGUF runner or framework — it provides a DeepSeek V4 Flash-specific Metal graph executor with DS4-specific loading, prompt rendering, KV state management, and an OpenAI/Anthropic-compatible HTTP server API. The project bets on one model at a time, with official-vector validation, long-context tests, and agent integration to ensure the model truly works end-to-end on high-end personal machines and Mac Studios starting from 128 GB of RAM.

    Key features include:

    • Metal-only inference — the optimized execution path runs entirely on Apple Metal; a CPU path exists only for correctness checks
    • Disk KV cache persistence — compressed KV caches are written to SSD, allowing long-context sessions to survive server restarts and session switches without re-prefilling
    • 2-bit and 4-bit quantization — asymmetric quantization targeting only routed MoE experts (IQ2_XXS up/gate, Q2_K down) lets the 284B-parameter model run on 128 GB MacBooks
    • 1 million token context window — the model supports up to 1M tokens; practical context is limited by available RAM
    • OpenAI-compatible server — ds4-server exposes /v1/chat/completions, /v1/completions, and /v1/models endpoints with SSE streaming, tool calling, and thinking-mode controls
    • Anthropic-compatible endpoint — /v1/messages supports Claude Code-style clients with tool_use blocks and thinking controls
    • Thinking mode support — non-thinking, thinking, and Think Max modes are supported; reasoning is streamed natively
    • Agent client integration — documented configuration for opencode, Pi, and Claude Code coding agents
    • Speculative decoding (MTP) — optional multi-token prediction path for greedy decoding; currently experimental
    • Test vector validation — short and long-context continuation vectors captured from the official DeepSeek V4 Flash API are used to catch tokenizer, template, or attention regressions
    • Interactive CLI — multi-turn chat with /think, /nothink, /ctx, /read, and other commands; Ctrl+C interrupts generation

    To get started, clone the repository, run ./download_model.sh q2 (128 GB machines) or ./download_model.sh q4 (256 GB+ machines), then make. Launch the CLI with ./ds4 or start the server with ./ds4-server --ctx 100000 --kv-disk-dir /tmp/ds4-kv --kv-disk-space-mb 8192.

    ds4.c - 1

    Community Discussions

    Be the first to start a conversation about ds4.c

    Share your experience with ds4.c, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source (MIT)

    Fully free and open-source under the MIT License. Download, use, modify, and distribute at no cost.

    • Metal-native DeepSeek V4 Flash inference
    • 2-bit and 4-bit quantization support
    • OpenAI and Anthropic-compatible server API
    • Disk KV cache persistence
    • Interactive CLI

    Capabilities

    Key Features

    • Metal-native inference engine for DeepSeek V4 Flash
    • Disk KV cache persistence for long-context sessions
    • 2-bit and 4-bit asymmetric quantization
    • 1 million token context window
    • OpenAI-compatible HTTP server API
    • Anthropic-compatible /v1/messages endpoint
    • SSE streaming with thinking-mode support
    • Tool calling with DSML format mapping
    • Speculative decoding via MTP (experimental)
    • Interactive multi-turn CLI
    • Official logit vector validation tests
    • Prefix-aware KV cache reuse across sessions
    • Single-session serialized Metal inference worker

    Integrations

    OpenAI API
    Anthropic API
    Claude Code
    opencode
    Pi agent
    Hugging Face
    llama.cpp / GGML (reference)
    GGUF format
    API Available
    View Docs

    Reviews & Ratings

    No ratings yet

    Be the first to rate ds4.c and help others make informed decisions.

    Developer

    antirez

    antirez is the creator of Redis and an independent open-source developer known for building high-performance, minimalist systems software. The ds4.c project reflects a focused approach: one model, one platform, end-to-end quality over generality. antirez develops with strong AI assistance while keeping humans in charge of ideas, testing, and debugging.

    Read more about antirez
    WebsiteGitHub
    1 tool in directory

    Similar Tools

    Lemonade icon

    Lemonade

    Open-source local LLM server for Windows, Linux, and macOS that runs LLMs, image generation, speech, and more on GPUs and NPUs with an OpenAI-compatible API.

    Bodega Inference Engine icon

    Bodega Inference Engine

    Enterprise-grade local LLM inference engine built specifically for Apple Silicon, featuring a multi-model registry, OpenAI-compatible API, and high-throughput continuous batching.

    Rapid-MLX icon

    Rapid-MLX

    The fastest local AI inference engine for Apple Silicon Macs, offering OpenAI-compatible API, 17 tool parsers, prompt cache, and 2-4x faster speeds than Ollama.

    Browse all tools

    Related Topics

    Local Inference

    Tools and platforms for running AI inference locally without cloud dependence.

    94 tools

    AI Infrastructure

    Infrastructure designed for deploying and running AI models.

    219 tools

    LLM Orchestration

    Platforms and frameworks for designing, managing, and deploying complex LLM workflows with visual interfaces, allowing for the coordination of multiple AI models and services.

    113 tools
    Browse all topics
    Back to all tools
    Explore AI Tools
    • AI Coding Assistants
    • Agent Frameworks
    • MCP Servers
    • AI Prompt Tools
    • Vibe Coding Tools
    • AI Design Tools
    • AI Database Tools
    • AI Website Builders
    • AI Testing Tools
    • LLM Evaluations
    Follow Us
    • X / Twitter
    • LinkedIn
    • Reddit
    • Discord
    • Threads
    • Bluesky
    • Mastodon
    • YouTube
    • GitHub
    • Instagram
    Get Started
    • About
    • Editorial Standards
    • Corrections & Disclosures
    • Community Guidelines
    • Advertise
    • Contact Us
    • Newsletter
    • Submit a Tool
    • Start a Discussion
    • Write A Blog
    • Share A Build
    • Terms of Service
    • Privacy Policy
    Explore with AI
    • ChatGPT
    • Gemini
    • Claude
    • Grok
    • Perplexity
    Agent Experience
    • llms.txt
    Theme
    With AI, Everyone is a Dev. EveryDev.ai © 2026
    Discussions