Apache Iceberg logo

Apache Iceberg

Apache Iceberg is an open table format for large analytic datasets that provides ACID transactions, schema evolution, hidden partitioning, and time travel. It works with Spark, Flink, Hive, Presto, Trino, DuckDB, ClickHouse, and many more compute engines. Governed by the Apache Software Foundation under the Apache 2.0 license.

3 APIs 1 Capabilities 8 Features
ACIDAnalyticsApacheData LakeLakehouseOpen SourceTable Format

APIs

Apache Iceberg REST Catalog API

The Iceberg REST Catalog API defines the specification for catalog server implementations, enabling table discovery, creation, metadata management, namespace management, and mul...

Apache Iceberg Java API

The Iceberg Java API provides programmatic access to table operations, schema management, partition management, and catalog implementations. It is the primary library for integr...

PyIceberg Python API

PyIceberg is the official Python implementation of the Apache Iceberg table specification. It provides programmatic access to Iceberg table metadata and data, with integrations ...

Capabilities

Apache Iceberg Catalog Management

Workflow capability for data engineers and lakehouse architects to manage namespaces, tables, and views in Apache Iceberg catalogs via the REST Catalog API.

Run with Naftiko

Features

ACID Transactions

Full ACID transaction support with serializable isolation for concurrent readers and writers.

Schema Evolution

Add, drop, update, or rename columns without rewriting existing data files.

Hidden Partitioning

Automatic partition management that prevents common user mistakes and silently incorrect results.

Partition Evolution

Change partition layout over time without rewriting existing data.

Time Travel

Query historical snapshots of tables and roll back to any prior version.

Row-Level Updates

Supports upserts, deletes, and updates at the row level via merge-on-read and copy-on-write modes.

Multi-Engine Support

Works with Spark, Flink, Hive, Trino, Presto, Impala, DuckDB, ClickHouse, and more.

Cloud-Native Storage

Native support for S3, ADLS, GCS, and HDFS with no filesystem dependencies.

Use Cases

Lakehouse Analytics

Build open lakehouse architectures with ACID guarantees across petabyte-scale datasets.

Real-Time Data Pipelines

Stream data into Iceberg tables via Flink or Kafka Connect with exactly-once semantics.

Data Versioning and Auditing

Use time travel to audit historical data states and implement regulatory compliance.

Multi-Engine Query Federation

Query the same Iceberg tables from multiple engines (Spark, Trino, DuckDB) without data duplication.

Cloud Data Migration

Migrate on-premises Hive workloads to cloud-native Iceberg tables with full compatibility.

Integrations

Apache Spark

Full read/write support for Iceberg tables in Spark batch and streaming workloads.

Apache Flink

Streaming and batch integration with exactly-once write support.

Apache Hive

Read and write Iceberg tables from Hive queries using the Iceberg Hive integration.

Trino

Query Iceberg tables from Trino with full partition pruning and predicate pushdown.

AWS Glue Catalog

Use AWS Glue as the Iceberg catalog backend with full metadata management.

AWS Athena

Query Iceberg tables stored in S3 using Amazon Athena.

Project Nessie

Git-like catalog branching and versioning via Nessie catalog integration.

DuckDB

Local analytics on Iceberg tables via the DuckDB Iceberg extension.

ClickHouse

Query Iceberg tables from ClickHouse via the ClickHouse Iceberg integration.

Snowflake

Access Iceberg tables managed in Snowflake's Polaris catalog.

Google BigQuery

Use BigQuery as a compute engine over Iceberg tables with BigLake Metastore.

Databricks

Create and query Iceberg tables on Databricks using Unity Catalog.

Semantic Vocabularies

Apache Iceberg Rest Catalog Open Api Context

119 classes · 190 properties

JSON-LD

API Governance Rules

Apache Iceberg API Rules

28 rules · 11 errors 12 warnings 5 info

SPECTRAL

Resources

👥
GitHubOrganization
GitHubOrganization
👥
GitHubRepository
GitHubRepository
🔗
Documentation
Documentation
📜
TermsOfService
TermsOfService
📰
Blog
Blog
👥
YouTube
YouTube
🔗
Versioning
Versioning
📄
ReleaseNotes
ReleaseNotes
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
NaftikoCapability
NaftikoCapability