Apache Calcite
Apache Calcite is a dynamic data management framework developed by the Apache Software Foundation that provides SQL parsing, query planning and optimization, and data federation capabilities. It serves as the SQL engine and query optimizer for many big data systems including Apache Hive, Druid, Flink, Kafka Streams, and others. Calcite provides a Java API for embedding SQL capabilities into applications, a JDBC adapter for connecting heterogeneous data sources, and an extensible relational algebra framework for building custom query optimizers.
APIs
Apache Calcite Java API
The Apache Calcite Java API provides SQL parsing, validation, query planning, and optimization capabilities for embedding in JVM applications. It exposes a relational algebra fr...
Apache Calcite JDBC API
The Apache Calcite JDBC adapter provides a standard JDBC interface over heterogeneous data sources. Applications use it to issue SQL queries across multiple data formats and sto...
Apache Calcite Avatica API
Apache Avatica is a framework for building database drivers derived from Apache Calcite. It provides a JSON/Protobuf-over-HTTP remote protocol for JDBC clients to connect to Cal...
Features
Parse and validate SQL queries using an extensible SQL grammar with support for SQL:2003 and beyond.
Cost-based and rule-based query optimization using a volcano-style optimizer with pluggable optimization rules.
Extensible relational algebra framework for representing and transforming query plans as expression trees.
Federate queries across heterogeneous data sources including CSV, JSON, JDBC databases, and Elasticsearch.
Pluggable adapter API for connecting new data sources to the Calcite SQL engine.
Automatic materialized view recognition and query rewriting for query acceleration.
SQL extensions for querying streaming data sources with window functions and temporal predicates.
Summary table recommendation and query rewriting using lattice structures for OLAP workloads.
Standard JDBC driver for issuing SQL queries against Calcite-connected data sources.
JSON/Protobuf-over-HTTP remote JDBC protocol for connecting clients to Calcite-based query servers.
Use Cases
Add SQL querying capability to Java applications using the Calcite Java API without a full database.
Use Calcite as the SQL parsing and optimization layer in custom query engine implementations.
Federate queries across multiple heterogeneous data sources using Calcite adapters.
Accelerate analytical queries using materialized view rewriting and lattice-based summary tables.
Parse SQL in one dialect and transpile it to another using Calcite's SQL generation framework.
Integrations
Flink uses Calcite for SQL parsing and query optimization in Flink SQL and Table API.
Hive uses Calcite for cost-based query optimization in HiveQL query planning.
Druid uses Calcite for SQL query parsing and planning against its time-series data store.
ksqlDB and Kafka Streams use Calcite for SQL stream processing query planning.
Beam SQL uses Calcite for query planning on PCollection-based streaming and batch pipelines.
Calcite provides a SQL adapter for querying Elasticsearch indices using standard SQL.
Kylin uses Calcite as its SQL engine for OLAP cube query planning and execution.