Apache Lucene
Apache Lucene is a high-performance, full-featured text search engine library written entirely in Java. It provides indexing and search technology, as well as spellchecking, hit highlighting, faceting, vector similarity search, and advanced analysis and tokenization capabilities. Lucene is the foundation for many popular search applications including Apache Solr.
APIs
Apache Lucene
Lucene provides a comprehensive Java API for full-text indexing, searching, faceting, hit highlighting, spatial search, vector nearest-neighbor search, and text analysis with su...
Features
High-performance full-text indexing with over 800GB/hour throughput on modern hardware with minimal RAM requirements.
Native support for approximate and exact k-nearest-neighbor vector similarity search alongside traditional keyword search.
Supports phrase queries, wildcard, proximity, range, fuzzy, and fielded queries with pluggable query parsers.
Built-in faceted search and result grouping capabilities for navigation and aggregation.
Highlights search keywords in result snippets using the Highlighter and UnifiedHighlighter modules.
Auto-suggest and spell-checking support via the Suggest module with multiple suggester implementations.
Extensive analyzer ecosystem supporting dozens of languages including ICU, Kuromoji (Japanese), Nori (Korean), OpenNLP, and more.
Supports Vector Space Model, Okapi BM25, and custom pluggable similarity implementations.
Geospatial search capabilities via the Spatial and Spatial3D modules.
Index replication support via the Replicator module for leader-follower architectures.
Use Cases
Power full-text search across enterprise documents, emails, databases, and file systems.
Implement fast, relevant product search with facets, autocomplete, and spell correction.
Index and search structured and unstructured log data for observability and security analytics.
Combine keyword search with vector embeddings for hybrid semantic and lexical retrieval.
Build searchable knowledge bases and documentation portals with rich query capabilities.
Integrations
Apache Solr is built on top of Lucene and adds distributed search, REST API, and enterprise features.
Elasticsearch and OpenSearch use Lucene as their underlying search engine.
Lucene integrates with Hadoop for large-scale distributed indexing pipelines.
Apache Tika extracts text from thousands of file formats for indexing into Lucene.
OpenNLP provides NLP analysis capabilities integrated through Lucene analyzers.
Apache Nutch is a web crawler that stores and indexes content via Lucene.
Official .NET port of Apache Lucene, maintained in the apache/lucenenet repository.