Groq
Groq builds custom Language Processing Unit (LPU) silicon optimized for low-latency LLM inference. The GroqCloud API serves popular open models (Llama, GPT OSS, Whisper, Orpheus) at industry-leading tokens-per-second with an OpenAI-compatible interface.
APIs
Groq Chat Completions API
OpenAI-compatible chat completions across Llama, GPT OSS, Mixtral, Gemma, and Whisper-family models running on Groq LPU silicon, with streaming, tool use, and structured outputs.
Groq Reasoning API
Reasoning-capable models with explicit chain-of-thought support, surfaced through the chat completions endpoint.
Groq Vision API
Image and document understanding plus OCR via vision-capable chat models.
Groq Speech-to-Text API
OpenAI-compatible audio transcription endpoint serving Whisper-family models on LPU hardware.
Groq Text-to-Speech API
Speech synthesis using Orpheus and other TTS models, billed per million characters.
Groq Content Moderation API
Safety classifier endpoint (Llama Guard) for input/output policy compliance.
Groq Batch API
Asynchronous batch inference at 50% off synchronous rates for non-realtime workloads.
Groq Flex Processing API
Flexible service tier offering higher throughput at relaxed latency targets for cost-sensitive workloads.
Groq Files API
Upload and manage files for batch inputs and other workflows.
Groq Models API
Lists models available on GroqCloud with metadata, context length, and pricing tags.
Groq Tools API
Built-in tools - Web Search, Browser Automation, Code Execution, Wolfram Alpha - invocable from chat completions and billed per call or per hour.
Groq LoRA Inference API
Serves customer LoRA adapters on top of supported base models for low-latency custom inference.
Groq Prompt Caching
Automatic prompt caching with a 50% discount on cached input tokens and no extra caching fee.