Aria is an open-source multimodal native mixture-of-experts model by Rhymes AI, capable of processing text, images, and video with state-of-the-art performance.
At a Glance
Pricing
Fully open-source model weights and code available at no cost for research and commercial use.
Engagement
Available On
Listed Mar 2026
About Aria
Aria is an open-source multimodal native mixture-of-experts (MoE) AI model developed by Rhymes AI, designed to handle text, images, and video inputs natively. It delivers state-of-the-art performance across a wide range of language and vision benchmarks while remaining efficient through its sparse MoE architecture. The model is publicly available on GitHub and Hugging Face, making it accessible for researchers and developers who want to build or fine-tune multimodal AI applications.
- Multimodal Native Architecture: Aria processes text, images, and video in a unified model without relying on separate encoders bolted together, enabling richer cross-modal understanding.
- Mixture-of-Experts (MoE) Design: Uses a sparse MoE approach so only a subset of parameters are activated per token, delivering high capability with lower inference cost.
- Open-Source Access: The full model weights and code are released on GitHub and Hugging Face under an open license, allowing anyone to download, run, and fine-tune the model.
- State-of-the-Art Benchmarks: Achieves competitive or leading results on standard language, vision-language, and video understanding benchmarks.
- Fine-Tuning Support: Includes scripts and documentation for supervised fine-tuning (SFT) on custom datasets, enabling domain-specific adaptation.
- Inference Recipes: Provides ready-to-use inference code and examples for running the model locally or on cloud GPU infrastructure.
- Community-Driven Development: Hosted on GitHub, the project welcomes contributions, issue reports, and pull requests from the broader AI research community.
Community Discussions
Be the first to start a conversation about Aria
Share your experience with Aria, ask questions, or help others learn from your insights.
Pricing
Open Source
Fully open-source model weights and code available at no cost for research and commercial use.
- Full model weights download
- Inference scripts
- Fine-tuning support
- Community support via GitHub Issues
Capabilities
Key Features
- Multimodal native model (text, image, video)
- Mixture-of-Experts (MoE) architecture
- Open-source model weights
- Fine-tuning support
- Inference scripts and examples
- State-of-the-art benchmark performance
- Hugging Face integration
