Baseten
Baseten is a production inference platform for deploying and serving custom and pre-trained ML models. Offers a Model APIs catalog with OpenAI-compatible endpoints (DeepSeek, Qwen, GLM, Nemotron), dedicated deployments via Truss, autoscaling GPU compute, async/queue inference, training, chains (multi-model workflows), and management APIs.
3 APIs
0 Features
AIMLInferenceDeploymentMLOpsOpenAI CompatibleAnthropic CompatibleTruss
APIs
Baseten LLM Inference API
OpenAI-compatible chat completions for Baseten's Model APIs catalog (DeepSeek V4, Qwen, GLM, Nemotron, etc.). Per-million-token pricing.
Baseten Anthropic-Compatible Messages API
Anthropic Messages-compatible inference for compatible Model APIs models.
Baseten Management & Async API
Deployment management, async/queued inference, chain calls (multi-model workflows), training, dedicated-deployment lifecycle, async result polling, and webhook delivery.
Resources
🔗
Website
Website
🔗
Documentation
Documentation
💰
Pricing
Pricing
🔗
Plans
Plans
🔗
RateLimits
RateLimits
🔗
FinOps
FinOps