vLLM logo

vLLM

vLLM is a high-throughput, memory-efficient open-source inference and serving engine for LLMs. It provides an OpenAI-compatible REST server (vllm serve) plus a Python API. vLLM is Apache 2.0 and run on your own GPU infrastructure; there is no hosted vLLM SaaS from the project itself.

1 APIs 0 Features
LLMInferenceOpen SourceGPUOpenAI CompatibleSelf-Hosted

APIs

vLLM OpenAI-Compatible Server

OpenAI-compatible REST API exposed by `vllm serve`. Endpoints include /v1/chat/completions, /v1/completions, /v1/embeddings, /v1/score, /v1/audio/transcriptions, /v1/audio/trans...

Resources

🔗
Website
Website
🌐
DeveloperPortal
DeveloperPortal
🔗
OpenSource
OpenSource
🔗
Plans
Plans
🔗
RateLimits
RateLimits
🔗
FinOps
FinOps

Sources

apis.yml Raw ↑
aid: vllm
url: https://raw.githubusercontent.com/api-evangelist/vllm/refs/heads/main/apis.yml
name: vLLM
x-type: company
description: >-
  vLLM is a high-throughput, memory-efficient open-source inference and serving engine for LLMs. It
  provides an OpenAI-compatible REST server (vllm serve) plus a Python API. vLLM is Apache 2.0 and run
  on your own GPU infrastructure; there is no hosted vLLM SaaS from the project itself.
image: https://kinlane-productions.s3.amazonaws.com/apis-json/apis-json-logo.jpg
tags:
  - LLM
  - Inference
  - Open Source
  - GPU
  - OpenAI Compatible
  - Self-Hosted
created: '2026-05-08'
modified: '2026-05-08'
specificationVersion: '0.19'
apis:
  - aid: vllm:openai-compatible
    name: vLLM OpenAI-Compatible Server
    description: >-
      OpenAI-compatible REST API exposed by `vllm serve`. Endpoints include /v1/chat/completions,
      /v1/completions, /v1/embeddings, /v1/score, /v1/audio/transcriptions, /v1/audio/translations,
      /v1/realtime (WebSocket), /tokenize, /detokenize, and /generative_scoring. Authentication via the
      --api-key flag set on server start; clients can use the official OpenAI Python library unmodified,
      with vLLM-specific extensions passed via extra_body.
    humanURL: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html
    baseURL: http://localhost:8000/v1
    tags:
      - Chat
      - Completions
      - Embeddings
      - Audio
      - Score
      - OpenAI-Compatible
    properties:
      - type: Documentation
        url: https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html
      - type: GitHub
        url: https://github.com/vllm-project/vllm
      - type: OpenAICompat
        url: https://platform.openai.com/docs/api-reference
common:
  - type: Website
    url: https://docs.vllm.ai/
  - type: DeveloperPortal
    url: https://docs.vllm.ai/
  - type: OpenSource
    url: https://github.com/vllm-project/vllm
  - type: Plans
    url: plans/vllm-plans-pricing.yml
  - type: RateLimits
    url: rate-limits/vllm-rate-limits.yml
  - type: FinOps
    url: finops/vllm-finops.yml
maintainers:
  - FN: Kin Lane
    email: [email protected]