Fireworks AI
Fireworks AI is a production-grade inference platform for open-source and proprietary generative models. The Fireworks API hosts Llama, DeepSeek, Qwen, Mixtral, Stable Diffusion, and other models with serverless pay-per-token, on-demand dedicated GPU, and batch deployment options, plus managed fine-tuning.
APIs
Fireworks Chat Completions API
OpenAI-compatible chat completions across 100+ open-source and proprietary models including Llama, DeepSeek, Qwen, and Mixtral, with streaming, function calling, and structured ...
Fireworks Completions API
Legacy text completion endpoint, OpenAI-compatible.
Fireworks Vision API
Vision-language inference for image and document understanding through chat completions.
Fireworks Embeddings API
Generate dense vector embeddings for retrieval, RAG, and semantic search using nomic, Qwen3, BGE, and other open embedding models.
Fireworks Rerank API
Cross-encoder reranking of candidate passages for higher-quality retrieval and RAG pipelines.
Fireworks Images API
Text-to-image and image-to-image generation across Stable Diffusion, FLUX, and other diffusion model families.
Fireworks Audio API
OpenAI-compatible audio transcription, translation, and TTS endpoints for Whisper and other audio models with low-latency streaming.
Fireworks Batch Inference API
Asynchronous batch inference at 50% of serverless rates for both input and output tokens.
Fireworks Fine-Tuning API
Supervised fine-tuning (LoRA and full-parameter) and reinforcement fine-tuning, with one-click deployment of fine-tuned weights at the same per-token price as base models.
Fireworks Files API
Upload and manage training datasets, batch input files, and fine-tuning artifacts.
Fireworks Models API
Lists models, deployments, and metadata across the Fireworks catalog.
Fireworks Deployments API
Provision and autoscale on-demand dedicated GPU deployments (H100, H200, B200, B300) billed per GPU-second.
Fireworks Account API
Programmatic access to account, billing, usage, and team management.