Main Menu
  • Tools
  • Developers
  • Topics
  • Discussions
  • Communities
  • News
  • Podcasts
  • Blogs
  • Builds
  • Contests
  • Compare
  • Arena
Create
    EveryDev.ai
    Sign inSubscribe
    Home
    Tools

    2,330+ AI tools

    • New
    • Trending
    • Featured
    • Compare
    • Arena
    Categories
    • Agents1228
    • Coding1045
    • Infrastructure455
    • Marketing414
    • Design374
    • Projects340
    • Analytics319
    • Research306
    • Testing200
    • Data171
    • Integration169
    • Security169
    • MCP164
    • Learning146
    • Communication131
    • Prompts122
    • Extensions120
    • Commerce116
    • Voice107
    • DevOps92
    • Web73
    • Finance19
    1. Home
    2. Tools
    3. Needle
    Needle icon

    Needle

    Local Inference

    A 26M parameter open-source function-call model distilled from Gemini, designed to run on tiny consumer devices like phones, watches, and glasses.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Fully open-source under MIT License. Free to use, modify, and distribute.

    Engagement

    Available On

    macOS
    Android
    iOS
    Web
    API

    Resources

    WebsiteGitHubllms.txt

    Topics

    Local InferenceAgent FrameworksAI Development Libraries

    Alternatives

    Atomic Agentsflash-moeIBM Granite Playground
    Developer
    Cactus ComputeSan Francisco, CAEst. 2025$1000000 raised

    Listed May 2026

    About Needle

    Needle is a 26-million parameter "Simple Attention Network" (SAN) developed by Cactus Compute, distilled from Gemini 3.1 and optimized for single-shot function calling on extremely resource-constrained devices. The model weights are fully open on HuggingFace under the Cactus-Compute/needle repository, and the project is licensed under MIT. According to the repository, in production Needle runs on the Cactus runtime at 6,000 tokens/sec prefill and 1,200 tokens/sec decode speed.

    What It Is

    Needle is a tiny language model purpose-built for tool/function calling on consumer hardware — phones, smartwatches, AR glasses, and similar edge devices. Rather than being a general-purpose conversational model, it is specifically post-trained on a 2-billion-token single-shot function call dataset to excel at structured output generation for agentic pipelines. The architecture uses a 12-layer encoder with grouped-query attention (GQA) and RoPE, cross-attending into an 8-layer decoder, with zero-centered RMSNorm (ZCRMSNorm) and gated residuals throughout, and a BPE vocabulary of 8,192 tokens at d=512.

    Architecture and Training

    The Simple Attention Network design deliberately omits feed-forward network (FFN) layers in the encoder, keeping the parameter count at 26M while retaining cross-attention between encoder and decoder stacks. Key training details from the repository:

    • Pretraining: 200B tokens on 16 TPU v6e chips over approximately 27 hours
    • Post-training: 2B tokens of single-shot function call data in approximately 45 minutes
    • Dataset generation: Synthesized via Gemini; generation scripts are open-sourced alongside the weights

    The repository notes that Needle beats FunctionGemma-270m, Qwen-0.6B, Granite-350m, and LFM2.5-350m on single-shot function call benchmarks for personal AI, while acknowledging those models have broader conversational scope and capacity.

    Setup and Workflow

    Getting started requires cloning the repository and running the provided setup script:

    git clone https://github.com/cactus-compute/needle.git
    cd needle && source ./setup
    needle playground
    

    This opens a Gradio web UI at localhost:7860 for interactive testing and one-click finetuning. The CLI exposes commands for inference (needle run), finetuning on custom JSONL data (needle finetune), full training runs, pretraining, evaluation, tokenization, synthetic data generation via Gemini, and TPU management. Weights are auto-downloaded on first use.

    Finetuning for Custom Tools

    A key design goal is local finetuning accessibility. The playground UI generates synthetic training data via the Gemini API, trains the model, evaluates it, and bundles the result — all from a single command. For CLI-based finetuning, users supply a JSONL file of tool definitions and examples. The repository explicitly targets Mac and PC users for local finetuning, reflecting the model's consumer-device orientation.

    Current Status

    The repository was created in February 2026 and last updated in May 2026, with 1,270 stars and 54 forks as of that date. The project is described as "an experimental run for Simple Attention Networks" and is positioned as a research and production prototype rather than a finished product. The authors caution that small models can be finicky and recommend testing and finetuning on specific tool sets before deployment.

    Needle - 1

    Community Discussions

    Be the first to start a conversation about Needle

    Share your experience with Needle, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source

    Fully open-source under MIT License. Free to use, modify, and distribute.

    • 26M parameter Needle model weights (open on HuggingFace)
    • Full source code on GitHub
    • CLI for inference, finetuning, training, and evaluation
    • Gradio web UI playground
    • Synthetic data generation via Gemini API

    Capabilities

    Key Features

    • 26M parameter Simple Attention Network (SAN) architecture
    • Single-shot function/tool calling
    • Encoder-decoder with cross-attention, GQA, RoPE, ZCRMSNorm
    • 6000 tok/sec prefill and 1200 tok/sec decode on Cactus runtime
    • Pretrained on 200B tokens, post-trained on 2B function call tokens
    • Fully open weights on HuggingFace
    • Local finetuning on Mac/PC
    • Gradio web UI playground for testing and finetuning
    • CLI for inference, finetuning, training, evaluation, and TPU management
    • Synthetic training data generation via Gemini API
    • Python API for inference
    • MIT licensed open-source codebase

    Integrations

    Cactus runtime
    HuggingFace (model weights)
    Gemini API (data synthesis)
    Gradio (web UI)
    TPU v6e (training)
    API Available

    Reviews & Ratings

    No ratings yet

    Be the first to rate Needle and help others make informed decisions.

    Developer

    Cactus Compute

    Cactus Compute builds on-device AI infrastructure and models targeting consumer edge devices such as phones, watches, and glasses. The team developed the Cactus runtime for high-speed local inference and the Needle model family for efficient function calling at the edge. Their open-source work includes fully public model weights, training datasets, and finetuning tooling. The project is led by Henry Ndubuaku and collaborators including Jakub Mroz, Karen Mosoyan, Roman Shemet, and others.

    Founded 2025
    San Francisco, CA
    $1000000 raised
    8 employees

    Used by

    Hobbyist developers (open-source…
    Mobile AI engineers
    Read more about Cactus Compute
    WebsiteGitHub
    1 tool in directory

    Similar Tools

    Atomic Agents icon

    Atomic Agents

    A lightweight JavaScript library for building AI agents that run directly in the browser using WebLLM.

    flash-moe icon

    flash-moe

    A Mixture of Experts (MoE) implementation in Python, enabling efficient sparse model inference by routing inputs to specialized expert sub-networks.

    IBM Granite Playground icon

    IBM Granite Playground

    Interactive playground for testing and experimenting with IBM's Granite family of open-source AI foundation models.

    Browse all tools

    Related Topics

    Local Inference

    Tools and platforms for running AI inference locally without cloud dependence.

    100 tools

    Agent Frameworks

    Tools and platforms for building and deploying custom AI agents.

    291 tools

    AI Development Libraries

    Programming libraries and frameworks that provide machine learning capabilities, model integration, and AI functionality for developers.

    162 tools
    Browse all topics
    Back to all tools
    Explore AI Tools
    • AI Coding Assistants
    • Agent Frameworks
    • MCP Servers
    • AI Prompt Tools
    • Vibe Coding Tools
    • AI Design Tools
    • AI Database Tools
    • AI Website Builders
    • AI Testing Tools
    • LLM Evaluations
    Follow Us
    • X / Twitter
    • LinkedIn
    • Reddit
    • Discord
    • Threads
    • Bluesky
    • Mastodon
    • YouTube
    • GitHub
    • Instagram
    Get Started
    • About
    • Editorial Standards
    • Corrections & Disclosures
    • Community Guidelines
    • Advertise
    • Contact Us
    • Newsletter
    • Submit a Tool
    • Start a Discussion
    • Write A Blog
    • Share A Build
    • Terms of Service
    • Privacy Policy
    Explore with AI
    • ChatGPT
    • Gemini
    • Claude
    • Grok
    • Perplexity
    Agent Experience
    • llms.txt
    Theme
    With AI, Everyone is a Dev. EveryDev.ai © 2026
    Discussions