Apache Doris
Apache Doris is a high-performance, real-time analytical database based on MPP (Massively Parallel Processing) architecture, governed by the Apache Software Foundation. It provides MySQL-protocol-compatible SQL queries, sub-second query latency on large-scale data, columnar storage with vectorized execution, real-time upsert via Stream Load and Routine Load APIs, and federated querying over data lakes (Hive, Iceberg, Hudi). It supports both shared-nothing and storage/compute-separated deployment modes.
APIs
Apache Doris
Apache Doris provides a MySQL-compatible protocol for SQL queries, a REST API for cluster management and monitoring, Stream Load HTTP API for real-time bulk data ingestion, Rout...
Features
Massively parallel processing with columnar storage and vectorized execution engine for high-concurrency sub-second analytical queries.
HTTP-based bulk data ingestion API that loads CSV, JSON, and Parquet data in real time with transactional guarantees.
Fully MySQL-wire-protocol compatible, enabling use of standard MySQL clients, drivers, and BI tools without modification.
Query external data in Hive, Iceberg, Hudi, and Delta Lake tables without data movement using Multi-Catalog.
Primary key based upsert model supports real-time CDC data ingestion with micro-second latency row-level updates.
Continuous data ingestion from Apache Kafka topics with automatic offset management and exactly-once semantics.
Hot/warm/cold data tiering with object storage (S3, HDFS) for cost-optimized storage at scale.
Model Context Protocol (MCP) server enabling AI agents to query Doris databases through natural language.
Use Cases
Power business intelligence dashboards with sub-second query latency on live data updated continuously.
Ingest and analyze high-volume log, metric, and event data in real time using inverted indexes and full-text search.
Consolidate customer behavioral and transactional data from multiple sources for real-time segmentation and analytics.
Federate queries across data lake (Hive, Iceberg) and operational databases without ETL movement.
Enable data analysts to run complex exploratory SQL queries on petabyte-scale datasets with fast response times.
Integrations
Official Flink Connector for reading from and writing to Doris in real-time Flink streaming pipelines.
Official Spark Connector for batch ETL and analytics workflows using Apache Spark.
Kafka Connector and Routine Load for continuous real-time data ingestion from Kafka topics.
Multi-Catalog feature enables federated queries over Iceberg, Hudi, and Hive Metastore data lakes.
Official Kubernetes Operator for automated Doris cluster lifecycle management.
OpenTelemetry demo integration for observability and tracing in Doris deployments.