Apache Solr
Apache Solr is an open-source enterprise search platform built on Apache Lucene. It provides distributed indexing, replication, load-balanced querying, automated failover and recovery, and centralized configuration through SolrCloud. Solr exposes comprehensive REST/HTTP APIs for document indexing, full-text search with faceting and highlighting, schema management, collections management, and cluster operations. It is an Apache Software Foundation project used by major organizations for enterprise-scale search solutions.
APIs
Apache Solr Search API
The Solr Search API provides HTTP endpoints for full-text document search, including query parsers (Standard, DisMax, Extended DisMax), JSON Query DSL, faceting and JSON Facet A...
Apache Solr Indexing API
The Solr Indexing API provides HTTP endpoints for adding, updating, and deleting documents from the search index. It supports JSON, XML, CSV, and binary Solr formats via the /up...
Apache Solr Schema API
The Solr Schema API provides REST endpoints for managing the schema of a Solr collection, including field types, fields, dynamic fields, and copy fields. The Managed Schema appr...
Apache Solr Collections API
The Solr Collections API provides REST endpoints for managing SolrCloud collections, shards, replicas, and aliases. It supports collection creation, deletion, modification, shar...
Apache Solr Config API
The Solr Config API and Request Parameters API provide REST endpoints for managing Solr's solrconfig.xml settings at runtime without server restart, including request handler co...
Features
Comprehensive full-text search with tokenization, stemming, synonyms, and relevance scoring.
Distributed search and indexing with automatic sharding, replication, and ZooKeeper coordination.
Dynamic faceting including field facets, range facets, pivot facets, and JSON Facet API.
SQL-like streaming expressions for distributed corpus analytics and aggregations.
Approximate nearest neighbor (ANN) search for AI/ML vector embeddings using HNSW algorithm.
Machine learning-based relevancy tuning with custom feature extraction and model training.
Near-real-time document retrieval before documents are committed to the index.
Geographic and spatial search with distance filtering and bounding box queries.
SQL query language with JDBC support for analytics tools like Zeppelin and R.
Use Cases
Unified enterprise search across documents, databases, web content, and file systems.
Product catalog search with faceting, filtering, and recommendation engines.
Log ingestion and search for operational intelligence and security analysis.
Semantic and similarity search using dense vector embeddings from AI models.
Full-text search backend for CMS platforms and digital asset management systems.
Integrations
Distributed coordination service for SolrCloud cluster management and configuration.
Stream ingestion via Kafka connector for real-time document indexing.
Solr Kubernetes Operator for cloud-native deployment and management.
Metrics integration via Prometheus exporter for Solr cluster monitoring.
Document parsing for indexing rich content like PDFs, Word documents, and HTML.
Data flow integration for automated document ingestion pipelines.
Natural language processing integration for text analysis and named entity recognition.