# Fireworks AI

> Fireworks AI is a high-performance inference cloud for open-source AI models, enabling developers and enterprises to build, fine-tune, and scale generative AI applications at blazing speed.

Fireworks AI is a cloud inference platform that provides fast, scalable access to open-source AI models optimized for production workloads. Founded by veterans of PyTorch, Meta, and Google, Fireworks delivers industry-leading throughput and latency across text, vision, speech, image, and embedding models. The platform supports the full AI model lifecycle — from serverless experimentation to fine-tuning and enterprise-grade on-demand GPU deployments — without requiring teams to manage infrastructure.

- **Serverless Inference** — *Sign up and start calling models instantly with per-token pricing, no cold starts, and $1 in free credits to get started.*
- **Model Library** — *Access a broad catalog of popular open-source models including DeepSeek, Qwen, Gemma, Kimi, Llama, Mistral, FLUX, and Whisper, all optimized for cost, speed, and quality.*
- **Fine-Tuning** — *Customize open models using LoRA or full-parameter SFT, DPO, and reinforcement fine-tuning (RFT) with minimal setup; fine-tuned models are served at base model prices.*
- **On-Demand GPU Deployments** — *Reserve A100, H100, H200, B200, or B300 GPUs billed per second for higher throughput, lower latency, and higher rate limits at scale.*
- **Multimodal Support** — *Run text, vision, speech-to-text (Whisper), and image generation (FLUX, SDXL) workloads through a unified API.*
- **Enterprise Security & Compliance** — *SOC2 Type 2, HIPAA, and GDPR compliant with zero data retention, data residency support, RBAC, and SSO (Google, OIDC, SAML).*
- **Bring Your Own Cloud** — *Deploy on Fireworks' globally distributed virtual cloud or bring your own cloud environment; available via AWS and GCP marketplaces.*
- **Batch Inference** — *Run batch jobs at 50% of serverless pricing for both input and output tokens, ideal for offline or high-volume workloads.*
- **Observability & Reliability** — *Built-in failover, load balancing, auto-scaling, and comprehensive metrics dashboards for production confidence.*
- **Developer Tooling** — *OpenAI-compatible API, CLI (firectl), SDKs, cookbooks, and detailed documentation to accelerate integration.*

## Features
- Serverless inference with per-token pricing
- On-demand GPU deployments (A100, H100, H200, B200, B300)
- Fine-tuning: LoRA SFT, LoRA DPO, Full Param SFT, Full Param DPO, RFT
- Support for text, vision, speech-to-text, image generation, and embeddings
- OpenAI-compatible API
- Batch inference at 50% discount
- Cached input token pricing
- Model library with 100+ open-source models
- SOC2 Type 2, HIPAA, GDPR compliance
- Zero data retention and data sovereignty
- RBAC and SSO (Google, OIDC, SAML)
- Bring your own cloud support
- Global distributed infrastructure
- Auto-scaling and load balancing
- CLI (firectl) and SDK support
- Cookbooks and documentation

## Integrations
AWS Marketplace, GCP Marketplace, Microsoft Azure (Foundry), PyTorch, NVIDIA GPUs, AMD GPUs, Whisper, FLUX, DeepSeek, Llama, Mistral, Qwen, Gemma, SDXL

## Platforms
WEB, API, CLI

## Pricing
Freemium — Free tier available with paid upgrades

## Links
- Website: https://fireworks.ai
- Documentation: https://docs.fireworks.ai/getting-started/introduction
- EveryDev.ai: https://www.everydev.ai/tools/fireworks-ai