Cartesia
Cartesia is a real-time multimodal AI platform built around the Sonic family of ultra-low-latency text-to-speech models and the Ink streaming speech-to-text models. Sonic models deliver the first audio byte in as little as 90ms, support more than 40 languages, and can express laughter and emotion, making them well-suited to conversational AI, voice agents, dubbing, and avatar applications. Ink models add streaming transcription with native turn detection optimized for voice agents. Cartesia ships Python, JavaScript, and Go SDKs and exposes REST, server-sent events, and WebSocket interfaces for streaming audio. The platform is SOC 2 Type II, HIPAA, and PCI Level 1 aligned.
Cartesia publishes 2 APIs on the APIs.io network. Tagged areas include Voice, TTS, Text to Speech, STT, and Speech to Text.
Cartesia’s developer surface includes documentation, engineering blog, pricing, and 7 more developer resources.
APIs
Cartesia Sonic Text-to-Speech API
The Sonic text-to-speech API converts text into ultra-low-latency, emotive speech with sub-100ms time-to-first-byte. It supports REST, server-sent events, and WebSocket streamin...
Cartesia Ink Speech-to-Text API
The Ink streaming speech-to-text API transcribes audio in real time with native turn detection tuned for voice agents and conversational systems.