EveryDev.ai
Subscribe
Home
Tools

3,020+ AI tools

  • New
  • Trending
  • Featured
  • Compare
  • Arena
Categories
  • Agents2063
  • Coding1441
  • Infrastructure665
  • Marketing524
  • Projects470
  • Research437
  • Design408
  • Analytics371
  • MCP268
  • Security265
  • Testing255
  • Data249
  • Integration183
  • Prompts183
  • Communication172
  • Learning166
  • Extensions163
  • Voice146
  • Commerce132
  • DevOps115
  • Web84
  • Finance24
AI Tools by Topic
  • AI Coding Assistants
  • Agent Frameworks
  • MCP Servers
  • AI Prompt Tools
  • Vibe Coding Tools
  • AI Design Tools
  • AI Database Tools
  • AI Website Builders
  • AI Testing Tools
  • LLM Evaluations
Follow Us
  • X / Twitter
  • LinkedIn
  • Reddit
  • Discord
  • Threads
  • Bluesky
  • Mastodon
  • YouTube
  • GitHub
  • Instagram
Get Started
  • About
  • Editorial Standards
  • Corrections & Disclosures
  • Community Guidelines
  • Advertise
  • Contact Us
  • Newsletter
  • Submit a Tool
  • Start a Discussion
  • Write A Blog
  • Share A Build
  • Terms of Service
  • Privacy Policy
Explore with AI
  • ChatGPT
  • Gemini
  • Claude
  • Grok
  • Perplexity
Agent Experience
  • llms.txt
Theme
With AI, Everyone is a Dev. EveryDev.ai © 2026
    1. Home
    2. Tools
    3. Daft
    Daft icon

    Daft

    Data Processing
    Featured

    Open-source, high-performance data engine for AI and multimodal workloads, enabling processing of images, audio, video, and structured data at any scale using a Python dataframe API.

    Visit Website

    At a Glance

    Pricing
    Open Source

    Fully open-source under Apache License 2.0. Free to use, modify, and distribute.

    Engagement

    Available On

    CLI
    API
    SDK

    Resources

    WebsiteDocsGitHubllms.txt

    Topics

    Data ProcessingAI InfrastructureAI Development Libraries

    Alternatives

    QuestDBxmloxideExtend UI
    Developer
    Eventual Inc.San Francisco, CAEst. 2022$30M raised

    Listed Jul 2026

    About Daft

    Daft is an open-source data engine built by Eventual Inc. for AI and multimodal data pipelines, licensed under Apache 2.0. Its core is written in Rust for performance, and it exposes a Python dataframe API familiar to Pandas and Spark users. The project has over 5,500 GitHub stars and is described by the vendor as being in production at organizations including Amazon and Essential AI.

    What It Is

    Daft is a distributed data processing framework designed specifically for the demands of AI workloads — particularly pipelines that mix structured metadata with unstructured multimodal data like images, video, audio, and embeddings. Unlike general-purpose dataframe libraries, Daft treats multimodal column types as first-class citizens and handles CPU/GPU scheduling within a single pipeline, eliminating the need for separate orchestration glue code.

    Architecture and Performance

    Daft's core engine is written in Rust and uses Apache Arrow for zero-copy execution. Key architectural properties include:

    • Multimodal-native column types: Images, video, audio, text, and embeddings are native column types that can be decoded, transformed, and filtered like any other column.
    • CPU and GPU co-scheduling: GPU inference and embeddings run alongside CPU decode and filter operations in one pipeline; Daft handles batching and scheduling automatically.
    • Lower memory footprint: The vendor claims Daft runs the same queries with 5x less memory than alternatives, allowing jobs that would OOM on Spark or Pandas to complete successfully.
    • 20x faster start time: The vendor reports a 20x improvement in pipeline start time compared to alternatives.
    • Rust core: Decoding video, running transforms, and joining multimodal data at TB scale without Python overhead.

    Ecosystem Integrations

    Daft integrates with a broad set of data infrastructure and ML tooling:

    • Table formats: Apache Iceberg, Delta Lake, Apache Hudi, Unity Catalog
    • Cloud storage: Amazon Web Services (S3), Azure, Google Cloud Storage
    • Compute: Ray (for distributed execution)
    • ML frameworks: PyTorch, Hugging Face
    • Dataframe interop: Pandas
    • Model providers: OpenAI, Hugging Face, and custom models via UDFs

    Use Cases

    The vendor highlights three primary use cases:

    1. AI Search — Using LLMs and embedding models, Daft extracts metadata, generates vectors, and writes them to a vector database.
    2. Data Enrichment — Enriching raw datasets with model-generated labels, captions, or structured outputs.
    3. Multimodal AI ETL — End-to-end pipelines from raw multimodal data to training-ready datasets.

    Adoption Signals

    The vendor publishes several user testimonials and case studies. According to the vendor, Amazon uses Daft to manage exabytes of Apache Parquet in its S3-based data catalog, with one engineer stating it improved efficiency of a critical data processing job by over 24%, saving over 40,000 years of Amazon EC2 vCPU computing time annually. Essential AI reportedly scaled a vLLM-inference pipeline to 32,000 sustained requests per second per VM using Daft. Together AI states Daft sped up fuzzy deduplication workloads by 10x on 100TB+ text data pipelines. The vendor also reports petabytes processed daily across its user base.

    Update: v0.7.16

    The latest release is v0.7.16, published on June 26, 2026, reflecting active and frequent development. The repository was last pushed to on July 1, 2026, with 323 open issues and 502 forks, indicating a healthy open-source community. The project has been under continuous development since its creation in April 2022.

    Daft - 1

    Community Discussions

    Be the first to start a conversation about Daft

    Share your experience with Daft, ask questions, or help others learn from your insights.

    Pricing

    OPEN SOURCE

    Open Source

    Fully open-source under Apache License 2.0. Free to use, modify, and distribute.

    • Full data engine for AI and multimodal workloads
    • Python dataframe API
    • Multimodal-native column types
    • CPU and GPU co-scheduling
    • Distributed execution via Ray

    Capabilities

    Key Features

    • Multimodal-native column types (images, video, audio, embeddings)
    • CPU and GPU co-scheduling in a single pipeline
    • Python dataframe API compatible with Pandas and Spark patterns
    • Managed UDF runtime with automatic batching, retries, and error handling
    • Zero-copy execution powered by Apache Arrow
    • Rust core for high-performance data processing
    • Local to production consistency — same code runs on laptop or cluster
    • 5x lower memory footprint vs alternatives
    • Native model operators for embeddings, LLM extraction, and structured outputs
    • Distributed execution via Ray integration
    • Support for Apache Iceberg, Delta Lake, Apache Hudi, Unity Catalog
    • Cloud-native I/O for AWS S3, Azure, Google Cloud Storage

    Integrations

    Apache Iceberg
    Delta Lake
    Apache Hudi
    Unity Catalog
    Amazon Web Services (S3)
    Azure
    Google Cloud Storage
    Ray
    Pandas
    PyTorch
    Hugging Face
    OpenAI
    API Available
    View Docs

    Ratings & Reviews

    No ratings yet

    Be the first to rate Daft and help others make informed decisions.

    Developer

    Eventual Inc.

    Eventual Inc. builds Daft, an open-source, high-performance data engine designed for AI and multimodal workloads. The company develops tools that let data and ML engineers process images, video, audio, and structured data at any scale using a familiar Python dataframe API. Daft's Rust-based core handles CPU/GPU co-scheduling, distributed execution via Ray, and integrations with major cloud storage and table formats. Eventual Inc. focuses on making production-grade multimodal data pipelines accessible without requiring new frameworks or infrastructure rewrites.

    Founded 2022
    San Francisco, CA
    $30M raised
    36 employees

    Used by

    Amazon
    Mobileye
    Together AI
    Read more about Eventual Inc.
    WebsiteGitHubX / Twitter
    1 tool in directory

    Similar Tools

    QuestDB icon

    QuestDB

    QuestDB is an open-source, high-performance time-series database built for demanding workloads, offering ultra-low latency ingestion, SIMD-accelerated SQL queries, and a multi-tier storage engine with native Parquet support.

    xmloxide icon

    xmloxide

    xmloxide is an open-source Rust library for parsing and manipulating XML documents with a focus on performance and safety.

    Extend UI icon

    Extend UI

    Open-source React component library for building document agents, user-facing document flows, and internal tools with PDF, DOCX, and XLSX viewers.

    Browse all tools

    Related Topics

    Data Processing

    AI-enhanced ETL (Extract, Transform, Load) tools and data pipelines that automate the processing, cleaning, and transformation of large datasets with intelligent optimizations.

    116 tools

    AI Infrastructure

    Infrastructure designed for deploying and running AI models.

    302 tools

    AI Development Libraries

    Programming libraries and frameworks that provide machine learning capabilities, model integration, and AI functionality for developers.

    244 tools
    Browse all topics
    Back to all toolsSuggest an edit
    ratings
    discussions