# Fireworks AI > Fireworks AI is a high-performance inference cloud for open-source AI models, enabling developers and enterprises to build, fine-tune, and scale generative AI applications at blazing speed. Fireworks AI is a cloud inference platform that provides fast, scalable access to open-source AI models optimized for production workloads. Founded by veterans of PyTorch, Meta, and Google, Fireworks delivers industry-leading throughput and latency across text, vision, speech, image, and embedding models. The platform supports the full AI model lifecycle — from serverless experimentation to fine-tuning and enterprise-grade on-demand GPU deployments — without requiring teams to manage infrastructure. - **Serverless Inference** — *Sign up and start calling models instantly with per-token pricing, no cold starts, and $1 in free credits to get started.* - **Model Library** — *Access a broad catalog of popular open-source models including DeepSeek, Qwen, Gemma, Kimi, Llama, Mistral, FLUX, and Whisper, all optimized for cost, speed, and quality.* - **Fine-Tuning** — *Customize open models using LoRA or full-parameter SFT, DPO, and reinforcement fine-tuning (RFT) with minimal setup; fine-tuned models are served at base model prices.* - **On-Demand GPU Deployments** — *Reserve A100, H100, H200, B200, or B300 GPUs billed per second for higher throughput, lower latency, and higher rate limits at scale.* - **Multimodal Support** — *Run text, vision, speech-to-text (Whisper), and image generation (FLUX, SDXL) workloads through a unified API.* - **Enterprise Security & Compliance** — *SOC2 Type 2, HIPAA, and GDPR compliant with zero data retention, data residency support, RBAC, and SSO (Google, OIDC, SAML).* - **Bring Your Own Cloud** — *Deploy on Fireworks' globally distributed virtual cloud or bring your own cloud environment; available via AWS and GCP marketplaces.* - **Batch Inference** — *Run batch jobs at 50% of serverless pricing for both input and output tokens, ideal for offline or high-volume workloads.* - **Observability & Reliability** — *Built-in failover, load balancing, auto-scaling, and comprehensive metrics dashboards for production confidence.* - **Developer Tooling** — *OpenAI-compatible API, CLI (firectl), SDKs, cookbooks, and detailed documentation to accelerate integration.* ## Features - Serverless inference with per-token pricing - On-demand GPU deployments (A100, H100, H200, B200, B300) - Fine-tuning: LoRA SFT, LoRA DPO, Full Param SFT, Full Param DPO, RFT - Support for text, vision, speech-to-text, image generation, and embeddings - OpenAI-compatible API - Batch inference at 50% discount - Cached input token pricing - Model library with 100+ open-source models - SOC2 Type 2, HIPAA, GDPR compliance - Zero data retention and data sovereignty - RBAC and SSO (Google, OIDC, SAML) - Bring your own cloud support - Global distributed infrastructure - Auto-scaling and load balancing - CLI (firectl) and SDK support - Cookbooks and documentation ## Integrations AWS Marketplace, GCP Marketplace, Microsoft Azure (Foundry), PyTorch, NVIDIA GPUs, AMD GPUs, Whisper, FLUX, DeepSeek, Llama, Mistral, Qwen, Gemma, SDXL ## Platforms WEB, API, CLI ## Pricing Freemium — Free tier available with paid upgrades ## Links - Website: https://fireworks.ai - Documentation: https://docs.fireworks.ai/getting-started/introduction - EveryDev.ai: https://www.everydev.ai/tools/fireworks-ai