Apache Iceberg
Apache Iceberg is an open table format for large analytic datasets that provides ACID transactions, schema evolution, hidden partitioning, and time travel. It works with Spark, Flink, Hive, Presto, Trino, DuckDB, ClickHouse, and many more compute engines. Governed by the Apache Software Foundation under the Apache 2.0 license.
APIs
Apache Iceberg REST Catalog API
The Iceberg REST Catalog API defines the specification for catalog server implementations, enabling table discovery, creation, metadata management, namespace management, and mul...
Apache Iceberg Java API
The Iceberg Java API provides programmatic access to table operations, schema management, partition management, and catalog implementations. It is the primary library for integr...
PyIceberg Python API
PyIceberg is the official Python implementation of the Apache Iceberg table specification. It provides programmatic access to Iceberg table metadata and data, with integrations ...
Capabilities
Apache Iceberg Catalog Management
Workflow capability for data engineers and lakehouse architects to manage namespaces, tables, and views in Apache Iceberg catalogs via the REST Catalog API.
Run with NaftikoFeatures
Full ACID transaction support with serializable isolation for concurrent readers and writers.
Add, drop, update, or rename columns without rewriting existing data files.
Automatic partition management that prevents common user mistakes and silently incorrect results.
Change partition layout over time without rewriting existing data.
Query historical snapshots of tables and roll back to any prior version.
Supports upserts, deletes, and updates at the row level via merge-on-read and copy-on-write modes.
Works with Spark, Flink, Hive, Trino, Presto, Impala, DuckDB, ClickHouse, and more.
Native support for S3, ADLS, GCS, and HDFS with no filesystem dependencies.
Use Cases
Build open lakehouse architectures with ACID guarantees across petabyte-scale datasets.
Stream data into Iceberg tables via Flink or Kafka Connect with exactly-once semantics.
Use time travel to audit historical data states and implement regulatory compliance.
Query the same Iceberg tables from multiple engines (Spark, Trino, DuckDB) without data duplication.
Migrate on-premises Hive workloads to cloud-native Iceberg tables with full compatibility.
Integrations
Full read/write support for Iceberg tables in Spark batch and streaming workloads.
Streaming and batch integration with exactly-once write support.
Read and write Iceberg tables from Hive queries using the Iceberg Hive integration.
Query Iceberg tables from Trino with full partition pruning and predicate pushdown.
Use AWS Glue as the Iceberg catalog backend with full metadata management.
Query Iceberg tables stored in S3 using Amazon Athena.
Git-like catalog branching and versioning via Nessie catalog integration.
Local analytics on Iceberg tables via the DuckDB Iceberg extension.
Query Iceberg tables from ClickHouse via the ClickHouse Iceberg integration.
Access Iceberg tables managed in Snowflake's Polaris catalog.
Use BigQuery as a compute engine over Iceberg tables with BigLake Metastore.
Create and query Iceberg tables on Databricks using Unity Catalog.