ngrok AI Gateway
ngrok AI Gateway provides traffic management and security for AI APIs including multi-provider routing, automatic failover, LLM prompt inspection, rate limiting, caching, observability, PII redaction, and access control. It enables teams to manage, secure, and monitor traffic to AI model providers (OpenAI, Anthropic, Google, DeepSeek) and self-hosted models such as Ollama and vLLM through an OpenAI-compatible interface.
APIs
ngrok AI Gateway
ngrok AI Gateway exposes an OpenAI-compatible HTTP interface for routing requests across multiple AI providers and self-hosted models. Each AI Gateway instance has a unique base...
Features
Direct requests to AI providers including OpenAI, Anthropic, Google, and DeepSeek through a single gateway endpoint.
If one provider or model fails, the gateway automatically tries the next configured model.
Works with official and third-party OpenAI SDKs by changing only the base URL.
Route requests to local systems such as Ollama or vLLM alongside hosted providers.
Use ngrok/auto for intelligent model picking based on configured strategies.
Define custom routing logic using Common Expression Language expressions.
Direct traffic to the cheapest available model option meeting requirements.
Restrict which providers and models clients can use by API key, identity, or policy.
Inspect and modify content to remove personally identifiable information from prompts and responses.
Modify and filter responses before they reach clients.
Access OpenAI and Anthropic models without individual provider signup, using ngrok credits.
Use Cases
Manage all AI provider traffic through a single gateway with unified observability and policy enforcement.
Route traffic to the most cost-effective model that meets quality requirements.
Enforce PII redaction and prompt inspection policies before requests leave the organization.
Failover automatically across providers to maintain AI service availability.
Route between hosted providers and self-hosted models such as Ollama or vLLM.
Integrations
Native OpenAI-compatible interface and routing to OpenAI models.
Routing and access to Anthropic Claude models through the gateway.
Routing to Google AI models.
Routing to DeepSeek models.
Routing to self-hosted Ollama instances.
Routing to self-hosted vLLM inference servers.