BentoML
BentoML is an open-source unified inference platform for building, packaging, and deploying machine learning models as scalable REST API services. Developers define services using Python class decorators that automatically expose model inference logic as HTTP endpoints. BentoCloud, the managed cloud offering, provides autoscaling infrastructure, GPU instance provisioning, scale-to-zero cost optimization, and a control-plane API for programmatic deployment lifecycle management. The platform supports all major ML frameworks including PyTorch, TensorFlow, Transformers, ONNX, XGBoost, and Scikit-Learn, and is licensed under Apache 2.0.
APIs
BentoCloud Deployment API
Python SDK and programmatic API for managing BentoCloud deployments. Provides operations to create, retrieve, list, update, apply, terminate, and delete inference deployments on...
BentoML Service REST API
Auto-generated REST API endpoints produced when BentoML services are deployed. Each decorated service method becomes an HTTP POST endpoint. Supports custom routes, path prefixes...
BentoML Python SDK
Core Python SDK for packaging models as Bentos, managing the model store, building container images, and interacting with BentoML services programmatically including client-side...
BentoCloud API Token Management
API for creating, listing, retrieving, and deleting API tokens used to authenticate with BentoCloud services. Supports scoped tokens with granular permissions including API acce...