Apache Druid logo

Apache Druid

Apache Druid is a high-performance, real-time analytics database governed by the Apache Software Foundation, designed for fast slice-and-dice OLAP queries on event-time data. It features a distributed, column-oriented storage engine with automatic rollup, supports both streaming (Kafka, Kinesis) and batch (S3, HDFS, local) data ingestion, and provides a SQL query interface plus a native JSON query API via REST. Druid is optimized for sub-second queries at petabyte scale with high concurrency.

1 APIs 8 Features
AnalyticsApacheDatabaseKafkaOLAPOpen SourceReal-TimeSQLTime Series

APIs

Apache Druid REST API

Druid exposes REST APIs for Druid SQL (POST /druid/v2/sql), native JSON queries (POST /druid/v2), batch and streaming data ingestion tasks, supervisor management for Kafka/Kines...

Features

Sub-Second OLAP Queries

Columnar storage with bitmap indexes, dictionary encoding, and pre-aggregation (rollup) enables sub-second queries on billions of events.

Druid SQL API

REST endpoint for submitting standard SQL queries with ANSI SQL support, time-based filtering, and streaming response options.

Native JSON Query API

Druid-native query format (Timeseries, TopN, GroupBy, Scan, Search) for maximum control and performance.

Streaming Ingestion

Real-time data ingestion from Apache Kafka and Amazon Kinesis with supervisor-managed offset tracking and exactly-once semantics.

Batch Ingestion

Parallel batch indexing tasks from local files, S3, GCS, HDFS, and other external storage systems.

Automatic Rollup

Pre-aggregates metrics at ingestion time to reduce storage and query time, configurable per datasource.

Time-Based Partitioning

All data is partitioned by time interval (segments), enabling efficient time-range query pruning.

Multi-Tenancy

Query isolation and resource management via query lanes, scheduler priorities, and row-level access control.

Use Cases

Real-Time Event Analytics

Analyze click streams, IoT events, application logs, and user behavior data with sub-second query latency.

Business Intelligence Dashboards

Power interactive BI dashboards with high-concurrency low-latency queries backed by Druid's columnar engine.

Network and Security Monitoring

Ingest and analyze network flow data and security events in real time for threat detection and capacity planning.

Ad Tech Analytics

Process advertising impression, click, and conversion events at high volume with real-time aggregation.

Operational Analytics

Monitor application performance metrics and operational data with drilldown and filtering capabilities.

Integrations

Apache Kafka

KafkaSupervisor for real-time continuous ingestion from Kafka topics into Druid datasources.

Amazon Kinesis

KinesisSupervisor for real-time data ingestion from AWS Kinesis data streams.

Apache Hadoop / HDFS

Native Hadoop batch indexing task for bulk loading data from HDFS or MapReduce job outputs.

Amazon S3 / GCS

Batch and streaming ingestion from object storage (S3, GCS, Azure Blob) using index tasks.

Apache Hive

Druid-Hive integration for querying Druid datasources from HiveQL and performing joins.

Kubernetes

Official Kubernetes operator for deploying and managing Druid clusters on Kubernetes.

Imply (Commercial)

Imply provides a commercial managed Druid service with additional features and enterprise support.

Semantic Vocabularies

Apache Druid Context

5 classes · 32 properties

JSON-LD

Resources

🌐
Portal
Portal
🔗
Documentation
Documentation
🚀
GettingStarted
GettingStarted
📰
Blog
Blog
👥
GitHubOrganization
GitHubOrganization
👥
GitHubRepository
GitHubRepository
👥
StackOverflow
StackOverflow
🔗
Vocabulary
Vocabulary