Together AI
Together AI is an AI acceleration cloud delivering fast, scalable, and reliable generative-AI infrastructure. The Together API serves open-source and proprietary foundation models for chat, embeddings, vision, audio, image and video generation, fine-tuning, code execution, and dedicated GPU compute.
APIs
Together Chat Completions API
OpenAI-compatible chat completions across hundreds of open-source and proprietary models including Llama, Qwen, DeepSeek, GLM, Kimi, and Mistral families with streaming, tool us...
Together Completions API
Legacy text-completion endpoint for non-chat models, OpenAI-compatible.
Together Embeddings API
Generates dense vector embeddings (e.g., multilingual-e5-large-instruct, BGE) for retrieval, RAG, and semantic-search workflows.
Together Rerank API
Reranks candidate passages against a query using cross-encoder models for higher-quality retrieval and RAG.
Together Images API
Text-to-image generation across FLUX.1, FLUX.2, Nano Banana Pro, Stable Diffusion, and Dreamshaper model families.
Together Video API
Text-to-video and image-to-video generation across multiple quality and duration tiers.
Together Audio API
Text-to-speech (MiniMax Speech, Cartesia Sonic, Kokoro, Orpheus) with sub-250ms latency, and speech-to-text (Whisper Large v3, Parakeet) with 40+ language support.
Together Vision API
Multimodal vision and document understanding using models such as Qwen 3.5 (397B and 9B) and Kimi K2.5.
Together Fine-Tuning API
Supervised fine-tuning (LoRA and full) and DPO across the Together model catalog with managed training jobs and one-click deployment.
Together Files API
Upload, list, retrieve, and delete training datasets and batch input files.
Together Models API
Lists hundreds of available models with metadata, capabilities, context window, and pricing.
Together Batch API
Asynchronous batch inference with up to 50% discount over synchronous rates; fetch results when complete.
Together Code Interpreter API
Sandboxed Python execution alongside model calls for tool-using agents and code workflows.
Together Evaluations API
LLM-as-judge evaluations with automated scoring and reports for model comparisons.
Together Dedicated Endpoints API
Provision and manage dedicated GPU-backed inference endpoints (H100, B200) with hourly billing for predictable performance and isolation.