EveryDev.ai
Sign inSubscribe
Home
Tools

2,790+ AI tools

  • New
  • Trending
  • Featured
  • Compare
  • Arena
Categories
  • Agents1916
  • Coding1362
  • Infrastructure646
  • Marketing508
  • Projects459
  • Research417
  • Design399
  • Analytics362
  • MCP249
  • Security249
  • Testing243
  • Data235
  • Integration181
  • Prompts171
  • Learning164
  • Communication163
  • Extensions158
  • Voice140
  • Commerce128
  • DevOps113
  • Web83
  • Finance24
AI Tools by Topic
  • AI Coding Assistants
  • Agent Frameworks
  • MCP Servers
  • AI Prompt Tools
  • Vibe Coding Tools
  • AI Design Tools
  • AI Database Tools
  • AI Website Builders
  • AI Testing Tools
  • LLM Evaluations
Follow Us
  • X / Twitter
  • LinkedIn
  • Reddit
  • Discord
  • Threads
  • Bluesky
  • Mastodon
  • YouTube
  • GitHub
  • Instagram
Get Started
  • About
  • Editorial Standards
  • Corrections & Disclosures
  • Community Guidelines
  • Advertise
  • Contact Us
  • Newsletter
  • Submit a Tool
  • Start a Discussion
  • Write A Blog
  • Share A Build
  • Terms of Service
  • Privacy Policy
Explore with AI
  • ChatGPT
  • Gemini
  • Claude
  • Grok
  • Perplexity
Agent Experience
  • llms.txt
Theme
With AI, Everyone is a Dev. EveryDev.ai © 2026
    1. Home
    2. Tools
    3. cuTile Rust
    cuTile Rust icon

    cuTile Rust

    AI Development Libraries
    Featured

    A tile-based system for writing memory-safe, data-race-free GPU kernels in idiomatic Rust, extending Rust's ownership discipline across the GPU launch boundary.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Free to use, modify, and distribute under the Apache License 2.0.

    Engagement

    Available On

    Linux
    API
    SDK
    CLI

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    AI Development LibrariesAI InfrastructureLocal Inference

    Alternatives

    turbovecxmloxideKiteSQL
    Developer
    NVlabs (NVIDIA Research)NVlabs is NVIDIA's research division, publishing open-source…

    Listed Jun 2026

    About cuTile Rust

    cuTile Rust (cutile-rs) is an open-source research project from NVIDIA's NVlabs that brings Rust's ownership and safety guarantees to GPU kernel programming. It targets tile-based kernels that lower through CUDA Tile IR, with APIs built around tensor partitions and tensor-core-oriented operations. The project was created in March 2026 and reached its v0.2.0 release in June 2026.

    What It Is

    cuTile Rust is a domain-specific language (DSL) and runtime library for authoring GPU kernels in Rust. Rather than exposing raw CUDA primitives, it models GPU work through a tile abstraction: mutable tensors are partitioned into disjoint pieces before launch, immutable tensors are shared, and generated launchers preserve Rust ownership semantics while GPU work is in flight. The #[cutile::module] macro captures a Rust AST for each kernel in the host binary; at runtime, cuTile Rust JIT-compiles that AST through CUDA Tile IR into a GPU cubin. The same model supports synchronous launches, asynchronous pipelines, and CUDA graph replay.

    Safety Model and Architecture

    The core design extends Rust's borrow checker across the GPU launch boundary:

    • Mutable tensors are partitioned into disjoint chunks before launch, preventing data races at the type level.
    • Immutable tensors are shared across tiles as read-only inputs.
    • Generated launchers hold ownership of tensor arguments while GPU work is in flight, so the host cannot alias or free them prematurely.
    • Local opt-outs remain available when lower-level control is needed.

    The workspace is organized into layered crates: cutile (user-facing), cutile-compiler, cutile-ir (pure Rust Tile IR builder), cuda-async, cuda-core, and cuda-bindings (NVIDIA CUDA bindings under NVIDIA Software License).

    Performance and Paper

    The accompanying paper, Fearless Concurrency on the GPU (arXiv:2606.15991), reports that on NVIDIA B200, cuTile Rust reaches 7 TB/s for element-wise operations and 2 PFlop/s for GEMM — approximately 91% of peak memory bandwidth and 92% of dense f16 peak, respectively. The paper states the GEMM result is competitive with cuBLAS, and that safety overhead microbenchmarks show no measurable runtime cost. The paper also evaluates Grout, a Qwen3 inference engine built with cuTile Rust in collaboration with Hugging Face, which the paper reports reaches 171 tokens/s for Qwen3-4B on RTX 5090 and 82 tokens/s for Qwen3-32B on B200.

    Setup Requirements

    cuTile Rust has specific hardware and software requirements:

    • NVIDIA GPU with compute capability sm_80 or higher
    • CUDA 13.3 recommended (for sm_80+ coverage and Tile IR 13.3 features such as FP4 packing and block-scaled MMA)
    • Rust 1.89+
    • Linux (tested on Ubuntu 24.04)

    A Nix flake is provided for reproducible development environments. The flake automatically locates host NVIDIA driver libraries on both NixOS and non-NixOS systems.

    Update: v0.2.0

    Version 0.2.0 was published on June 16, 2026, and serves as the reference version for the paper evaluation benchmarks. The project README describes it as an early-stage research release under active development, with expected bugs, incomplete features, and API breakage ahead. The repository had 380 stars and 30 forks as of the last update. Related projects include cuTile Python, TileGym, and the Hugging Face Grout inference engine.

    cuTile Rust - 1

    Community Discussions

    Be the first to start a conversation about cuTile Rust

    Share your experience with cuTile Rust, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source

    Free to use, modify, and distribute under the Apache License 2.0.

    • Full source code access
    • Tile-based GPU kernel authoring in Rust
    • JIT compilation through CUDA Tile IR
    • Async and sync kernel launch support
    • CUDA graph replay

    Capabilities

    Key Features

    • Tile-based GPU kernel authoring in idiomatic Rust
    • Ownership-safe tensor partitioning across GPU launch boundary
    • #[cutile::module] macro for JIT kernel compilation
    • JIT compilation through CUDA Tile IR to GPU cubin
    • Synchronous and asynchronous kernel launch support
    • CUDA graph replay support
    • Tensor partition API for disjoint mutable access
    • Shared read-only tensor inputs
    • Local opt-outs for lower-level control
    • Nix flake for reproducible development environments
    • Reusable kernel library (cutile-kernels)
    • Async CUDA execution via async Rust

    Integrations

    CUDA Tile IR
    NVIDIA CUDA
    Hugging Face Grout
    cuBLAS
    Rust cargo ecosystem
    Nix flakes
    API Available
    View Docs

    Ratings & Reviews

    No ratings yet

    Be the first to rate cuTile Rust and help others make informed decisions.

    Developer

    NVlabs (NVIDIA Research)

    NVlabs is NVIDIA's research division, publishing open-source AI and deep learning frameworks. The team develops efficiency-oriented models and training infrastructure for computer vision, generative AI, and embodied intelligence. SANA is a flagship open-source project from NVlabs, combining academic research with production-ready deployment tooling. The lab collaborates with MIT Han Lab and other academic partners on model efficiency and quantization.

    Read more about NVlabs (NVIDIA Research)
    WebsiteGitHubLinkedIn
    2 tools in directory

    Similar Tools

    turbovec icon

    turbovec

    A Rust vector index with Python bindings built on Google Research's TurboQuant algorithm, offering 2–4 bit compression and SIMD-accelerated search faster than FAISS.

    xmloxide icon

    xmloxide

    xmloxide is an open-source Rust library for parsing and manipulating XML documents with a focus on performance and safety.

    KiteSQL icon

    KiteSQL

    A lightweight embedded relational database and native Rust data API, fully written in Rust, supporting SQL execution, typed ORM models, RocksDB/LMDB backends, WebAssembly, and Python bindings.

    Browse all tools

    Related Topics

    AI Development Libraries

    Programming libraries and frameworks that provide machine learning capabilities, model integration, and AI functionality for developers.

    210 tools

    AI Infrastructure

    Infrastructure designed for deploying and running AI models.

    279 tools

    Local Inference

    Tools and platforms for running AI inference locally without cloud dependence.

    129 tools
    Browse all topics
    Back to all toolsSuggest an edit
    ratings
    discussion
    6views