Fireworks AI

Name: Fireworks AI
Availability: OnlineOnly
Author: Fireworks AI

Fireworks AI is a high-performance inference cloud for open-source AI models, enabling developers and enterprises to build, fine-tune, and scale generative AI applications at blazing speed.

Visit Website

At a Glance

Pricing

Free tier available

Get started with $1 in free credits on serverless inference. No setup required.

Serverless Inference: $0 usage-based

Fine Tuning: $0.5 usage-based

On-Demand GPU Deployments: $3 usage-based

+1 more plan

Engagement

Available On

Web

API

CLI

Fireworks AIRedwood City, CAEst. 2022$327000000 raised

Listed Apr 2026

About Fireworks AI

Fireworks AI is a cloud inference platform that provides fast, scalable access to open-source AI models optimized for production workloads. Founded by veterans of PyTorch, Meta, and Google, Fireworks delivers industry-leading throughput and latency across text, vision, speech, image, and embedding models. The platform supports the full AI model lifecycle — from serverless experimentation to fine-tuning and enterprise-grade on-demand GPU deployments — without requiring teams to manage infrastructure.

Serverless Inference — Sign up and start calling models instantly with per-token pricing, no cold starts, and $1 in free credits to get started.
Model Library — Access a broad catalog of popular open-source models including DeepSeek, Qwen, Gemma, Kimi, Llama, Mistral, FLUX, and Whisper, all optimized for cost, speed, and quality.
Fine-Tuning — Customize open models using LoRA or full-parameter SFT, DPO, and reinforcement fine-tuning (RFT) with minimal setup; fine-tuned models are served at base model prices.
On-Demand GPU Deployments — Reserve A100, H100, H200, B200, or B300 GPUs billed per second for higher throughput, lower latency, and higher rate limits at scale.
Multimodal Support — Run text, vision, speech-to-text (Whisper), and image generation (FLUX, SDXL) workloads through a unified API.
Enterprise Security & Compliance — SOC2 Type 2, HIPAA, and GDPR compliant with zero data retention, data residency support, RBAC, and SSO (Google, OIDC, SAML).
Bring Your Own Cloud — Deploy on Fireworks' globally distributed virtual cloud or bring your own cloud environment; available via AWS and GCP marketplaces.
Batch Inference — Run batch jobs at 50% of serverless pricing for both input and output tokens, ideal for offline or high-volume workloads.
Observability & Reliability — Built-in failover, load balancing, auto-scaling, and comprehensive metrics dashboards for production confidence.
Developer Tooling — OpenAI-compatible API, CLI (firectl), SDKs, cookbooks, and detailed documentation to accelerate integration.

Community Discussions

Be the first to start a conversation about Fireworks AI

Share your experience with Fireworks AI, ask questions, or help others learn from your insights.

Pricing

FREE

Serverless Free Credits

Get started with $1 in free credits on serverless inference. No setup required.

$1 in free credits
Per-token pricing
High rate limits
Postpaid billing
Access to full model library

Serverless Inference

Pay per token for text, vision, speech, image, and embedding models with no infrastructure management.

usage based

Text & vision models from $0.10/1M tokens
Speech-to-text from $0.0009/audio minute
Image generation from $0.00013/step
Embeddings from $0.008/1M tokens
Batch inference at 50% discount
Cached input tokens at 50% discount
No cold starts
High rate limits

Fine Tuning

Supervised, preference, and reinforcement fine-tuning priced per 1M training tokens. Fine-tuned models served at base model prices.

$0.5

usage based

LoRA SFT from $0.50/1M tokens (up to 16B params)
LoRA DPO from $1.00/1M tokens
Full Param SFT from $1.00/1M tokens
Full Param DPO from $2.00/1M tokens
Reinforcement fine-tuning (RFT) priced per GPU hour
VLM supervised fine-tuning
Fine-tuned models served at base model prices

On-Demand GPU Deployments

Reserve dedicated GPUs billed per second for higher throughput, lower latency, and higher rate limits.

usage based

A100 80GB GPU at $2.90/hour
H100 80GB GPU at $6.00/hour
H200 141GB GPU at $6.00/hour
B200 180GB GPU at $9.00/hour
B300 288GB GPU at $11.00/hour
Billed per second
No extra charges for start-up times
Higher rate limits
Faster speeds

Enterprise

Custom enterprise deployments with bring-your-own-cloud, compliance, SSO, RBAC, and dedicated support.

Custom

contact sales

SOC2 Type 2, HIPAA, GDPR compliant
Bring your own cloud or run on Fireworks cloud
Zero data retention and data sovereignty
RBAC and SSO (Google, OIDC, SAML)
AWS and GCP marketplace purchasing
Dedicated Fireworks AI engineering support
Custom rate limits and SLAs
Observability and metrics dashboards

View official pricing

Capabilities

Key Features

Serverless inference with per-token pricing
On-demand GPU deployments (A100, H100, H200, B200, B300)
Fine-tuning: LoRA SFT, LoRA DPO, Full Param SFT, Full Param DPO, RFT
Support for text, vision, speech-to-text, image generation, and embeddings
OpenAI-compatible API
Batch inference at 50% discount
Cached input token pricing
Model library with 100+ open-source models
SOC2 Type 2, HIPAA, GDPR compliance
Zero data retention and data sovereignty
RBAC and SSO (Google, OIDC, SAML)
Bring your own cloud support
Global distributed infrastructure
Auto-scaling and load balancing
CLI (firectl) and SDK support
Cookbooks and documentation

Integrations

AWS Marketplace

GCP Marketplace

Microsoft Azure (Foundry)

PyTorch

NVIDIA GPUs

AMD GPUs

Whisper

FLUX

DeepSeek

Llama

Mistral

Qwen

Gemma

SDXL

API Available

View Docs

Back to all tools

About Fireworks AI

Serverless Inference — Sign up and start calling models instantly with per-token pricing, no cold starts, and $1 in free credits to get started.
Model Library — Access a broad catalog of popular open-source models including DeepSeek, Qwen, Gemma, Kimi, Llama, Mistral, FLUX, and Whisper, all optimized for cost, speed, and quality.
Fine-Tuning — Customize open models using LoRA or full-parameter SFT, DPO, and reinforcement fine-tuning (RFT) with minimal setup; fine-tuned models are served at base model prices.
On-Demand GPU Deployments — Reserve A100, H100, H200, B200, or B300 GPUs billed per second for higher throughput, lower latency, and higher rate limits at scale.
Multimodal Support — Run text, vision, speech-to-text (Whisper), and image generation (FLUX, SDXL) workloads through a unified API.
Enterprise Security & Compliance — SOC2 Type 2, HIPAA, and GDPR compliant with zero data retention, data residency support, RBAC, and SSO (Google, OIDC, SAML).
Bring Your Own Cloud — Deploy on Fireworks' globally distributed virtual cloud or bring your own cloud environment; available via AWS and GCP marketplaces.
Batch Inference — Run batch jobs at 50% of serverless pricing for both input and output tokens, ideal for offline or high-volume workloads.
Observability & Reliability — Built-in failover, load balancing, auto-scaling, and comprehensive metrics dashboards for production confidence.
Developer Tooling — OpenAI-compatible API, CLI (firectl), SDKs, cookbooks, and detailed documentation to accelerate integration.

Community Discussions

Be the first to start a conversation about Fireworks AI

Share your experience with Fireworks AI, ask questions, or help others learn from your insights.

Pricing

FREE

Serverless Free Credits

Get started with $1 in free credits on serverless inference. No setup required.

$1 in free credits
Per-token pricing
High rate limits
Postpaid billing
Access to full model library

Serverless Inference

Pay per token for text, vision, speech, image, and embedding models with no infrastructure management.

usage based

Text & vision models from $0.10/1M tokens
Speech-to-text from $0.0009/audio minute
Image generation from $0.00013/step
Embeddings from $0.008/1M tokens
Batch inference at 50% discount
Cached input tokens at 50% discount
No cold starts
High rate limits

Fine Tuning

Supervised, preference, and reinforcement fine-tuning priced per 1M training tokens. Fine-tuned models served at base model prices.

$0.5

usage based

LoRA SFT from $0.50/1M tokens (up to 16B params)
LoRA DPO from $1.00/1M tokens
Full Param SFT from $1.00/1M tokens
Full Param DPO from $2.00/1M tokens
Reinforcement fine-tuning (RFT) priced per GPU hour
VLM supervised fine-tuning
Fine-tuned models served at base model prices

On-Demand GPU Deployments

Reserve dedicated GPUs billed per second for higher throughput, lower latency, and higher rate limits.

usage based

A100 80GB GPU at $2.90/hour
H100 80GB GPU at $6.00/hour
H200 141GB GPU at $6.00/hour
B200 180GB GPU at $9.00/hour
B300 288GB GPU at $11.00/hour
Billed per second
No extra charges for start-up times
Higher rate limits
Faster speeds

Enterprise

Custom enterprise deployments with bring-your-own-cloud, compliance, SSO, RBAC, and dedicated support.

Custom

contact sales

SOC2 Type 2, HIPAA, GDPR compliant
Bring your own cloud or run on Fireworks cloud
Zero data retention and data sovereignty
RBAC and SSO (Google, OIDC, SAML)
AWS and GCP marketplace purchasing
Dedicated Fireworks AI engineering support
Custom rate limits and SLAs
Observability and metrics dashboards

View official pricing

Capabilities

Key Features

Serverless inference with per-token pricing
On-demand GPU deployments (A100, H100, H200, B200, B300)
Fine-tuning: LoRA SFT, LoRA DPO, Full Param SFT, Full Param DPO, RFT
Support for text, vision, speech-to-text, image generation, and embeddings
OpenAI-compatible API
Batch inference at 50% discount
Cached input token pricing
Model library with 100+ open-source models
SOC2 Type 2, HIPAA, GDPR compliance
Zero data retention and data sovereignty
RBAC and SSO (Google, OIDC, SAML)
Bring your own cloud support
Global distributed infrastructure
Auto-scaling and load balancing
CLI (firectl) and SDK support
Cookbooks and documentation

Integrations

AWS Marketplace

GCP Marketplace

Microsoft Azure (Foundry)

PyTorch

NVIDIA GPUs

AMD GPUs

Whisper

FLUX

DeepSeek

Llama

Mistral

Qwen

Gemma

SDXL

API Available

View Docs

Fireworks AI

At a Glance

Engagement

Available On

Resources

Topics

Alternatives

About Fireworks AI

Community Discussions

Be the first to start a conversation about Fireworks AI

Pricing

Serverless Free Credits

Serverless Inference

Fine Tuning

On-Demand GPU Deployments

Enterprise

Capabilities

Key Features

Integrations

Fireworks AI

At a Glance

Engagement

Available On

Resources

Topics

Alternatives

About Fireworks AI

Community Discussions

Be the first to start a conversation about Fireworks AI

Pricing

Serverless Free Credits

Serverless Inference

Fine Tuning

On-Demand GPU Deployments

Enterprise

Capabilities

Key Features

Integrations