Modular
AI infrastructure platform with MAX framework, Mojo language, and Mammoth for GPU-portable GenAI serving across NVIDIA and AMD hardware.
At a Glance
Pricing
An open AI platform powered by MAX and Mojo - free for every developer
Engagement
Available On
About Modular
Modular provides a unified AI infrastructure platform designed to deliver state-of-the-art performance for generative AI workloads across multiple GPU vendors. The platform combines MAX (a GenAI serving framework), Mojo (a high-performance programming language), and Mammoth (a Kubernetes-native control plane for large-scale distributed AI serving) to enable developers to build, optimize, and deploy AI systems with unprecedented hardware portability.
-
MAX Framework offers a GenAI serving framework that supports 500+ open models with customizable, open-source implementations portable across NVIDIA and AMD GPUs, delivering up to 70% faster inference compared to vanilla vLLM.
-
Mojo Language provides Python-like syntax with systems-level performance, enabling developers to write high-performance GPU code without deep CUDA expertise while achieving speeds up to 12x faster than Python.
-
Mammoth Orchestration scales AI workloads from a single GPU to unlimited nodes with a Kubernetes-native control plane specifically designed for large-scale distributed AI serving.
-
Hardware Portability eliminates vendor lock-in through breakthrough compiler technology that automatically generates optimized kernels for any hardware target, supporting NVIDIA, AMD, and Apple Silicon.
-
Tiny Containers delivers 90% smaller container sizes (under 700MB vs vLLM) with sub-second cold starts, reducing infrastructure costs and deployment complexity.
-
Open Source Stack democratizes high-performance AI by open-sourcing the entire stack including optimized kernels, enabling full customization down to the silicon level.
-
Enterprise Support includes SOC 2 Type I certification, dedicated engineering contacts, custom SLAs, and flexible deployment options including cloud, on-premise, and hybrid configurations.
To get started, install the free Community Edition via Docker, PIP, UV, PIXI, or Conda, then deploy GenAI models locally using the OpenAI-compatible API. Browse the model repository at builds.modular.com to find optimized models for your use case.

Community Discussions
Be the first to start a conversation about Modular
Share your experience with Modular, ask questions, or help others learn from your insights.
Pricing
Free Plan Available
An open AI platform powered by MAX and Mojo - free for every developer
- SOTA GenAI serving performance
- Supports the latest AI models across the latest AI hardware
- Deploy MAX and Mojo yourself in any cloud environment
- Open source and a vibrant community of developers
- Community support through Discord and Github
Batch API Endpoint
Fully managed batch API endpoints that are 85% lower cost than competitors
- Asynchronous large-scale batch inference endpoints
- Support the latest AI models - Qwen3, InternVL, GPT-OSS
- Lowest-cost endpoints to maximize ROI
- Turn around large batches in hours to days
- SOC 2 Type I certified and independently audited
- Dedicated customer support
Dedicated Endpoint
Fully managed, dedicated API endpoints for low-latency online inference
- Distributed, large-scale online inference endpoints
- Support the latest AI models - Qwen3, InternVL, GPT-OSS
- Highest-performance endpoints to maximize ROI
- Resilient, high-availability, large-scale services
- SOC 2 Type I certified and independently audited
- Dedicated customer support
Enterprise
Advanced deployments with full data control, CSP or Neocloud compute, or hybrid approach
- Everything in Dedicated Endpoint
- Deployment in your cloud or on-premise environment
- Optimization of your custom pipelines and workloads
- Hybrid deployments designed for data sovereignty
- Tailored and flexible SLAs and SLOs for enterprise needs
- Roadmap prioritization
Capabilities
Key Features
- 500+ GenAI model support
- GPU portability across NVIDIA and AMD
- MAX GenAI serving framework
- Mojo programming language
- Mammoth distributed orchestration
- OpenAI API compatibility
- 90% smaller container sizes
- Sub-second cold starts
- Open source kernels
- Multi-cloud deployment
- SOC 2 Type I certified
- Custom kernel development
- Batch inference endpoints
- Dedicated inference endpoints
- Enterprise hybrid deployments