Spiral

Name: Spiral
Availability: OnlineOnly
Author: Spiral

A data warehouse for pre-training that maximizes model FLOPs utilization with multimodal data support and GPU saturation.

Visit Website

At a Glance

Pricing

Paid

Enterprise: Custom/contact

Engagement

Available On

Linux

Web

API

SpiralNew York, NYEst. 2023$22M raised

Listed Feb 2026

About Spiral

Spiral is a data warehouse designed specifically for pre-training machine learning models, enabling teams to maximize Model FLOPs Utilization (MFU) with multimodal data. It provides a scalable infrastructure for ingesting, processing, and enriching large datasets including tensors, audio, images, and video without the typical I/O bottlenecks that slow down GPU training pipelines.

Multimodal Data Ingestion: Quickly ingest any data type at any size, including tensors, audio, images, and video files, making it ideal for diverse pre-training datasets.
Flexible Schema Evolution: Append columns and rows without rewriting existing data, allowing datasets to evolve organically without costly migrations or upfront schema design.
GPU Saturation: Run interactive queries that load more bytes per second into an H100 than precomputed Parquet results on local disk, eliminating I/O bottlenecks.
Selective and Parameterized Reads: Access data selectively with push-down predicates, reading only the data you need without custom data access layers.
Massive Scale Support: Scale to millions of columns without upfront schema design, accommodating the complex metadata requirements of modern ML datasets.
Built on Vortex: Powered by Vortex, an open-source columnar format donated to the Linux Foundation, offering Pareto-optimal performance faster than Apache Parquet for virtually any workload.
Broad Ecosystem Integration: Works seamlessly with popular tools including Spark, Dask, Modal, DuckDB, Polars, PyTorch, Pandas, Arrow, Iceberg, and Ray.

To get started with Spiral, request access through their website. The platform integrates with familiar data processing tools and standards, making adoption straightforward for teams already working with modern data stacks. Spiral is particularly suited for organizations building large-scale pre-training pipelines that need to efficiently manage and serve multimodal datasets to GPU clusters.

Community Discussions

Be the first to start a conversation about Spiral

Share your experience with Spiral, ask questions, or help others learn from your insights.

Pricing

Enterprise

Contact for access to the data warehouse for pre-training

Custom

contact sales

Multimodal data ingestion
Schema evolution without rewriting
GPU saturation
Selective and parameterized reads
Scale to millions of columns
Tool integrations

View official pricing

Capabilities

Key Features

Multimodal data ingestion (tensors, audio, images, video)
Schema evolution without data rewriting
GPU saturation for maximum throughput
Selective and parameterized push-down reads
Scale to millions of columns
Built on Vortex columnar format
Pareto-optimal performance vs Parquet
Interoperable with existing data ecosystems

Integrations

Spark

Dask

Modal

DuckDB

Polars

PyTorch

Pandas

Arrow

Iceberg

Ray

API Available

Back to all tools Suggest an edit

Spiral

AI Infrastructure

A data warehouse for pre-training that maximizes model FLOPs utilization with multimodal data support and GPU saturation.

Visit Website

At a Glance

Pricing

Paid

Enterprise: Custom/contact

Engagement

ratings

discussions

22views

Available On

Linux

Web

API

Resources

Website GitHub llms.txt

Topics

AI Infrastructure Data Processing Database Tools

Alternatives

Vector AI DB Replicate Bright Data

Developer

SpiralNew York, NYEst. 2023$22M raised

Listed Feb 2026

About Spiral

Multimodal Data Ingestion: Quickly ingest any data type at any size, including tensors, audio, images, and video files, making it ideal for diverse pre-training datasets.
Flexible Schema Evolution: Append columns and rows without rewriting existing data, allowing datasets to evolve organically without costly migrations or upfront schema design.
GPU Saturation: Run interactive queries that load more bytes per second into an H100 than precomputed Parquet results on local disk, eliminating I/O bottlenecks.
Selective and Parameterized Reads: Access data selectively with push-down predicates, reading only the data you need without custom data access layers.
Massive Scale Support: Scale to millions of columns without upfront schema design, accommodating the complex metadata requirements of modern ML datasets.
Built on Vortex: Powered by Vortex, an open-source columnar format donated to the Linux Foundation, offering Pareto-optimal performance faster than Apache Parquet for virtually any workload.
Broad Ecosystem Integration: Works seamlessly with popular tools including Spark, Dask, Modal, DuckDB, Polars, PyTorch, Pandas, Arrow, Iceberg, and Ray.

Community Discussions

Be the first to start a conversation about Spiral

Share your experience with Spiral, ask questions, or help others learn from your insights.

Pricing

Enterprise

Contact for access to the data warehouse for pre-training

Custom

contact sales

Multimodal data ingestion
Schema evolution without rewriting
GPU saturation
Selective and parameterized reads
Scale to millions of columns
Tool integrations

View official pricing

Capabilities

Key Features

Multimodal data ingestion (tensors, audio, images, video)
Schema evolution without data rewriting
GPU saturation for maximum throughput
Selective and parameterized push-down reads
Scale to millions of columns
Built on Vortex columnar format
Pareto-optimal performance vs Parquet
Interoperable with existing data ecosystems

Integrations

Spark

Dask

Modal

DuckDB

Polars

PyTorch

Pandas

Arrow

Iceberg

Ray

API Available

Back to all tools Suggest an edit