Cerebras
Cerebras Systems designs the wafer-scale WSE-3 chip and the CS-2/CS-3 AI systems built around it, and operates Cerebras Inference, a high-throughput cloud platform for running open-source large language models including Llama, Qwen, and DeepSeek families. Cerebras Inference is positioned as one of the fastest token-generation services in the market, with OpenAI-compatible REST endpoints, first-party Python and Node.js SDKs, and dedicated and on-prem deployment options for enterprise customers. The company partners with OpenAI, AWS, GSK, Mayo Clinic, and Notion, and maintains an active open source presence including its model garden and inference cookbook on GitHub.
Cerebras publishes 1 API on the APIs.io network. Tagged areas include AI Inference, Large Language Models, Wafer Scale, Hardware, and Cloud.
Cerebras’ developer surface includes documentation, pricing, engineering blog, status page, and 6 more developer resources.
APIs
Cerebras Inference API
The Cerebras Inference API exposes ultra-low-latency inference for open-weight large language models including Llama 3.1, Llama 4, Qwen, and other frontier open models. The API ...