Vespa is an open-source AI search engine, big-data serving engine, and vector database originally developed inside Yahoo and spun out as Vespa.ai AS. Vespa combines vector search, text search (BM25), structured filtering, and machine-learned ranking — including native tensor inference — into a single distributed serving engine that scales to billions of documents with sub-100ms latency. Vespa Cloud is the fully managed commercial offering operated by the Vespa.ai team across AWS and GCP, with Startup, Basic, Commercial, and Enterprise plans plus a Self-Managed option for customers running the open-source engine on their own infrastructure. Vespa is widely used at Spotify, Perplexity, Yahoo, Farfetch, and Elicit for search, recommendation, personalization, and Retrieval-Augmented Generation (RAG).
Vespa publishes 2 APIs on the APIs.io network: Query API and Document API. Tagged areas include AI, Search, Vector Database, Big Data, and Machine Learning.
The Vespa catalog on APIs.io includes 3 machine-runnable capabilities, 1 JSON-LD context, and 1 Spectral governance ruleset.
Vespa’s developer surface includes documentation, getting-started guide, engineering blog, pricing, developer console, support, changelog, and 22 more developer resources.
The Vespa Query (Search) API executes structured and vector queries against a Vespa application using YQL (Vespa Query Language). It supports text search with BM25, approximate-...
The Vespa Document API (/document/v1) provides synchronous REST access to document operations against a Vespa content cluster. It supports Put, Get, Update (partial update with ...
The Vespa Deploy API (/application/v2) manages application packages on a Vespa configuration server. It supports preparing, activating, and tearing down application packages, se...
The Vespa Tenant API (/application/v2/tenant) manages tenants and applications hosted on a Vespa configuration server or Vespa Cloud control plane. It exposes operations for cre...
The Vespa Config API (/config/v2) lets services in a Vespa application retrieve their configuration from a Vespa configuration server using the config-server / config-proxy prot...
The Vespa Cluster Controller API (/cluster/v2) exposes runtime state and management endpoints for a Vespa content cluster — including node state queries, maintenance-mode transi...
The Vespa State API (/state/v1) exposes per-service health, version, and metrics endpoints for any Vespa node — used by orchestration tooling, monitoring agents, and load balanc...
Vespa exposes a family of metrics endpoints (/metrics/v1, /metrics/v2, /prometheus/v1) that publish Vespa engine and application metrics in JSON or Prometheus exposition format ...
Vespa /document/v1 capability covering Put, Get, Update, Remove, and Visit operations against a Vespa content cluster. Self-contained Naftiko capability for the Vespa documents ...
Vespa Query (Search) API capability. Executes YQL / hybrid / vector queries against a Vespa application endpoint and returns ranked hits with relevance, fields, and coverage. Se...
Vespa /state/v1 capability exposing per-service health, version, and metrics for any Vespa node — consumed by orchestrators, load balancers, and monitoring agents.
Combine BM25 text relevance with vector similarity and structured filters in a single query executed by Vespa's multi-phase ranking pipeline.
Retrieval Augmented Generation
Serve grounded context to large language models by indexing documents, chunks, and embeddings in Vespa and retrieving them with hybrid search at sub-100ms latency.
Recommendation and Personalization
Power recommendation systems with machine-learned ranking, real-time feature updates, and tensor inference over user and item embeddings.
Ad Targeting and Real-Time Bidding
Match candidate ads against user context and serve ranked impressions within tight latency budgets using Vespa's distributed serving engine.
E-Commerce Search and Browse
Combine faceted navigation, structured filters, text relevance, and learned ranking for large product catalogs with frequent updates.
Streaming Search for Personal Data
Run "streaming search" mode that scans a user's personal corpus on demand — ideal for mail, messaging, and document search where each user has their own private index.
aid: vespa-ai
name: Vespa
description: Vespa is an open-source AI search engine, big-data serving engine, and vector database originally developed
inside Yahoo and spun out as Vespa.ai AS. Vespa combines vector search, text search (BM25), structured filtering, and
machine-learned ranking — including native tensor inference — into a single distributed serving engine that scales to
billions of documents with sub-100ms latency. Vespa Cloud is the fully managed commercial offering operated by the
Vespa.ai team across AWS and GCP, with Startup, Basic, Commercial, and Enterprise plans plus a Self-Managed option for
customers running the open-source engine on their own infrastructure. Vespa is widely used at Spotify, Perplexity,
Yahoo, Farfetch, and Elicit for search, recommendation, personalization, and Retrieval-Augmented Generation (RAG).
type: Index
position: Provider
access: 3rd-Party
image: https://kinlane-productions.s3.amazonaws.com/apis-json/apis-json-logo.jpg
tags:
- AI
- Search
- Vector Database
- Big Data
- Machine Learning
- Semantic Search
- Retrieval Augmented Generation
- Open Source
- Tensor
- Recommendations
url: https://raw.githubusercontent.com/api-evangelist/vespa-ai/refs/heads/main/apis.yml
created: '2026-05-25'
modified: '2026-05-25'
specificationVersion: '0.19'
apis:
- aid: vespa-ai:vespa-query-api
name: Vespa Query API
description: The Vespa Query (Search) API executes structured and vector queries against a Vespa application using
YQL (Vespa Query Language). It supports text search with BM25, approximate-nearest-neighbor vector search over HNSW
indexes, hybrid search, machine-learned ranking with multi-phase rank profiles, grouping/aggregation, pagination,
result presentation, and tracing. Queries can be issued as GET requests with query-string parameters or POST
requests with a JSON body for complex expressions.
humanURL: https://docs.vespa.ai/en/query-api.html
tags:
- AI
- Search
- Query
- YQL
- Vector Search
- Ranking
- Hybrid Search
properties:
- url: openapi/vespa-query-api-openapi.yml
type: OpenAPI
- url: https://docs.vespa.ai/en/query-api.html
type: Documentation
- url: https://docs.vespa.ai/en/reference/api/query.html
type: Documentation
- url: https://docs.vespa.ai/en/getting-started.html
type: GettingStarted
- type: NaftikoCapability
url: capabilities/vespa-query.yaml
- aid: vespa-ai:vespa-document-api
name: Vespa Document API
description: The Vespa Document API (/document/v1) provides synchronous REST access to document operations against a
Vespa content cluster. It supports Put, Get, Update (partial update with assign/add/remove operators), Remove, and
Visit (streaming visit, copy, delete-where, update-where) over JSON or JSON Lines, with conditional writes,
multi-tenant namespaces, field-set projection, time-window selection, and pagination via continuation tokens.
humanURL: https://docs.vespa.ai/en/reference/document-v1-api-reference.html
tags:
- Documents
- CRUD
- Indexing
- Data
- Streaming
properties:
- url: openapi/vespa-document-api-openapi.yml
type: OpenAPI
- url: https://docs.vespa.ai/en/reference/document-v1-api-reference.html
type: Documentation
- url: https://docs.vespa.ai/en/writing/document-v1-api-guide.html
type: Documentation
- url: https://docs.vespa.ai/en/reads-and-writes.html
type: Documentation
- type: NaftikoCapability
url: capabilities/vespa-documents.yaml
- aid: vespa-ai:vespa-deploy-api
name: Vespa Deploy API
description: The Vespa Deploy API (/application/v2) manages application packages on a Vespa configuration server.
It supports preparing, activating, and tearing down application packages, session-based deployments, schema
validation, and zero-downtime updates of services, schemas, and rank profiles.
humanURL: https://docs.vespa.ai/en/reference/deploy-rest-api-v2.html
tags:
- Deployment
- Configuration
- Application
- DevOps
properties:
- url: https://docs.vespa.ai/en/reference/deploy-rest-api-v2.html
type: Documentation
- url: https://docs.vespa.ai/en/application-packages.html
type: Documentation
- aid: vespa-ai:vespa-tenant-api
name: Vespa Tenant and Application API
description: The Vespa Tenant API (/application/v2/tenant) manages tenants and applications hosted on a Vespa
configuration server or Vespa Cloud control plane. It exposes operations for creating tenants, listing
applications, and binding application sessions to a tenant.
humanURL: https://docs.vespa.ai/en/reference/application-v2-tenant.html
tags:
- Tenants
- Applications
- Multi-Tenancy
- Administration
properties:
- url: https://docs.vespa.ai/en/reference/application-v2-tenant.html
type: Documentation
- aid: vespa-ai:vespa-config-api
name: Vespa Config API
description: The Vespa Config API (/config/v2) lets services in a Vespa application retrieve their configuration
from a Vespa configuration server using the config-server / config-proxy protocol. It is primarily used by Vespa
services and tooling rather than end users, but is documented as a stable HTTP API.
humanURL: https://docs.vespa.ai/en/reference/config-rest-api-v2.html
tags:
- Configuration
- Internal
properties:
- url: https://docs.vespa.ai/en/reference/config-rest-api-v2.html
type: Documentation
- aid: vespa-ai:vespa-cluster-api
name: Vespa Cluster Controller API
description: The Vespa Cluster Controller API (/cluster/v2) exposes runtime state and management endpoints for a
Vespa content cluster — including node state queries, maintenance-mode transitions, and storage cluster orchestration.
humanURL: https://docs.vespa.ai/en/reference/cluster-v2.html
tags:
- Cluster
- Operations
- Content
- State
properties:
- url: https://docs.vespa.ai/en/reference/cluster-v2.html
type: Documentation
- aid: vespa-ai:vespa-state-api
name: Vespa State API
description: The Vespa State API (/state/v1) exposes per-service health, version, and metrics endpoints for any
Vespa node — used by orchestration tooling, monitoring agents, and load balancers to check liveness, readiness,
and runtime metrics.
humanURL: https://docs.vespa.ai/en/reference/state-v1.html
tags:
- Health
- Monitoring
- Metrics
- Observability
properties:
- url: https://docs.vespa.ai/en/reference/state-v1.html
type: Documentation
- type: NaftikoCapability
url: capabilities/vespa-state.yaml
- aid: vespa-ai:vespa-metrics-api
name: Vespa Metrics API
description: Vespa exposes a family of metrics endpoints (/metrics/v1, /metrics/v2, /prometheus/v1) that publish
Vespa engine and application metrics in JSON or Prometheus exposition format for scraping by Prometheus,
Grafana, or other observability stacks.
humanURL: https://docs.vespa.ai/en/operations/metrics.html
tags:
- Metrics
- Prometheus
- Observability
- Monitoring
properties:
- url: https://docs.vespa.ai/en/operations/metrics.html
type: Documentation
- url: https://docs.vespa.ai/en/reference/metrics-v1.html
type: Documentation
- url: https://docs.vespa.ai/en/reference/metrics-v2.html
type: Documentation
- url: https://docs.vespa.ai/en/reference/prometheus-v1.html
type: Documentation
common:
- type: Website
url: https://vespa.ai
- type: Documentation
url: https://docs.vespa.ai/
- type: GettingStarted
url: https://docs.vespa.ai/en/getting-started.html
- type: Tutorials
url: https://docs.vespa.ai/en/learn/tutorials/
- type: GitHubOrganization
url: https://github.com/vespa-engine
- type: GitHubRepository
url: https://github.com/vespa-engine/vespa
- type: License
url: https://github.com/vespa-engine/vespa/blob/master/LICENSE
- type: Blog
url: https://blog.vespa.ai/
- type: BlogRSS
url: https://blog.vespa.ai/feed.xml
- type: Pricing
url: https://cloud.vespa.ai/pricing
- type: Console
url: https://console.vespa-cloud.com/
- type: Slack
url: https://slack.vespa.ai/
- type: Support
url: https://github.com/vespa-engine/vespa/issues
- type: ChangeLog
url: https://github.com/vespa-engine/vespa/releases
- type: SDK
name: Vespa CLI (Go)
url: https://github.com/vespa-engine/vespa/tree/master/client/go
- type: SDK
name: pyvespa (Python)
url: https://github.com/vespa-engine/pyvespa
- type: SDK
name: pyvespa Documentation
url: https://vespa-engine.github.io/pyvespa/
- type: SDK
name: vespa-feed-client (Java)
url: https://github.com/vespa-engine/vespa/tree/master/vespa-feed-client
- type: SDK
name: vespa-search (JavaScript)
url: https://github.com/vespa-engine/vespa-search
- type: SampleApps
url: https://github.com/vespa-engine/sample-apps
- type: PrometheusExporter
url: https://github.com/vespa-engine/vespa_exporter
- type: DockerImage
url: https://github.com/vespa-engine/docker-image
- type: GitHubAction
url: https://github.com/vespa-engine/setup-vespa-cli-action
- type: SpectralRules
url: rules/vespa-ai-rules.yml
- type: Vocabulary
url: vocabulary/vespa-ai-vocabulary.yml
- type: JSONLDContext
url: json-ld/vespa-ai-context.jsonld
- type: Plans
url: plans/vespa-ai-plans-pricing.yml
- type: RateLimits
url: rate-limits/vespa-ai-rate-limits.yml
- type: FinOps
url: finops/vespa-ai-finops.yml
- type: Features
data:
- Open-source under Apache 2.0
- Vector search with HNSW indexes
- BM25 text search and hybrid search
- Native tensor and ML model inference at serving time
- YQL (Vespa Query Language) for structured queries
- Multi-phase ranking (match-phase, first-phase, second-phase, global-phase)
- Document API with conditional writes, visits, and JSON Lines streaming
- Multi-tenant namespaces and document groups
- Real-time indexing with sub-100ms query latency
- Distributed content clusters with automatic sharding and replication
- Streaming search mode for personal/private corpora
- Built-in machine learning inference (TensorFlow, ONNX, XGBoost, LightGBM)
- Approximate nearest neighbor and exact nearest neighbor operators
- Application packages with schemas, services.xml, and rank profiles
- Container API for custom searchers, document processors, and handlers
- Self-managed (Apache 2.0) or fully managed Vespa Cloud (AWS, GCP)
- Vespa Cloud Startup plan from $0.05 / vCPU-hour, $0.005 / GiB-memory-hour
- Vespa Cloud Commercial plan with 24/7 1-hour SLA support
- Vespa Cloud Enterprise plan with $20k/month minimum and 15-minute SLA
- Up to 50% volume discounts and 15% committed-spend discount
sources:
- https://cloud.vespa.ai/price-calculator.html
- https://docs.vespa.ai/
updated: '2026-05-25'
- type: UseCases
data:
- name: Hybrid Search
description: Combine BM25 text relevance with vector similarity and structured filters in a single query
executed by Vespa's multi-phase ranking pipeline.
- name: Retrieval Augmented Generation
description: Serve grounded context to large language models by indexing documents, chunks, and embeddings in
Vespa and retrieving them with hybrid search at sub-100ms latency.
- name: Recommendation and Personalization
description: Power recommendation systems with machine-learned ranking, real-time feature updates, and tensor
inference over user and item embeddings.
- name: Ad Targeting and Real-Time Bidding
description: Match candidate ads against user context and serve ranked impressions within tight latency
budgets using Vespa's distributed serving engine.
- name: E-Commerce Search and Browse
description: Combine faceted navigation, structured filters, text relevance, and learned ranking for
large product catalogs with frequent updates.
- name: Streaming Search for Personal Data
description: Run "streaming search" mode that scans a user's personal corpus on demand — ideal for mail,
messaging, and document search where each user has their own private index.
- type: Integrations
data:
- name: AWS
- name: Google Cloud
- name: Prometheus
- name: Grafana
- name: TensorFlow
- name: ONNX Runtime
- name: XGBoost
- name: LightGBM
- name: Kubernetes
- name: LangChain
- name: LlamaIndex
- name: Haystack
integrations:
- name: AWS
- name: Google Cloud
- name: LangChain
- name: LlamaIndex
- name: Haystack
- name: Prometheus
maintainers:
- FN: Kin Lane
email: [email protected]