Apache Doris logo

Apache Doris

Apache Doris is a high-performance, real-time analytical database based on MPP (Massively Parallel Processing) architecture, governed by the Apache Software Foundation. It provides MySQL-protocol-compatible SQL queries, sub-second query latency on large-scale data, columnar storage with vectorized execution, real-time upsert via Stream Load and Routine Load APIs, and federated querying over data lakes (Hive, Iceberg, Hudi). It supports both shared-nothing and storage/compute-separated deployment modes.

1 APIs 8 Features
AnalyticsApacheDatabaseLakehouseMPPOLAPOpen SourceReal-TimeSQL

APIs

Apache Doris

Apache Doris provides a MySQL-compatible protocol for SQL queries, a REST API for cluster management and monitoring, Stream Load HTTP API for real-time bulk data ingestion, Rout...

Features

MPP Columnar Analytics

Massively parallel processing with columnar storage and vectorized execution engine for high-concurrency sub-second analytical queries.

Stream Load API

HTTP-based bulk data ingestion API that loads CSV, JSON, and Parquet data in real time with transactional guarantees.

MySQL Protocol Compatibility

Fully MySQL-wire-protocol compatible, enabling use of standard MySQL clients, drivers, and BI tools without modification.

Federated Data Lakehouse Queries

Query external data in Hive, Iceberg, Hudi, and Delta Lake tables without data movement using Multi-Catalog.

Real-Time Upsert (Unique Key Model)

Primary key based upsert model supports real-time CDC data ingestion with micro-second latency row-level updates.

Routine Load from Kafka

Continuous data ingestion from Apache Kafka topics with automatic offset management and exactly-once semantics.

Tiered Storage

Hot/warm/cold data tiering with object storage (S3, HDFS) for cost-optimized storage at scale.

MCP Server

Model Context Protocol (MCP) server enabling AI agents to query Doris databases through natural language.

Use Cases

Real-Time Dashboards and Reporting

Power business intelligence dashboards with sub-second query latency on live data updated continuously.

Log and Event Analytics

Ingest and analyze high-volume log, metric, and event data in real time using inverted indexes and full-text search.

Customer Data Platform

Consolidate customer behavioral and transactional data from multiple sources for real-time segmentation and analytics.

Data Lakehouse Analytics

Federate queries across data lake (Hive, Iceberg) and operational databases without ETL movement.

Ad-Hoc Analytics

Enable data analysts to run complex exploratory SQL queries on petabyte-scale datasets with fast response times.

Integrations

Apache Flink

Official Flink Connector for reading from and writing to Doris in real-time Flink streaming pipelines.

Apache Spark

Official Spark Connector for batch ETL and analytics workflows using Apache Spark.

Apache Kafka

Kafka Connector and Routine Load for continuous real-time data ingestion from Kafka topics.

Apache Iceberg / Hudi / Hive

Multi-Catalog feature enables federated queries over Iceberg, Hudi, and Hive Metastore data lakes.

Kubernetes

Official Kubernetes Operator for automated Doris cluster lifecycle management.

OpenTelemetry

OpenTelemetry demo integration for observability and tracing in Doris deployments.

Semantic Vocabularies

Apache Doris Context

4 classes · 38 properties

JSON-LD

Resources

🌐
Portal
Portal
🔗
Documentation
Documentation
🚀
GettingStarted
GettingStarted
📰
Blog
Blog
👥
GitHubOrganization
GitHubOrganization
👥
GitHubRepository
GitHubRepository
👥
StackOverflow
StackOverflow
🔗
Vocabulary
Vocabulary