Apache ORC logo

Apache ORC

Apache ORC is a self-describing, type-aware columnar file format designed for Hadoop workloads. It provides high compression ratios and fast read performance for large-scale data processing with support for complex data types.

1 APIs 1 Capabilities 6 Features
Big DataColumnar StorageCompressionFile FormatHadoopApacheOpen Source

APIs

Apache ORC

ORC provides Java and C++ APIs for reading and writing ORC columnar files, with support for predicate pushdown, column projection, compression codecs, and integration with Hive,...

Capabilities

Apache ORC File Processing Workflow

Workflow capability for reading, writing, converting, and analyzing Apache ORC columnar files.

Run with Naftiko

Features

Columnar Storage

Stores data by column for efficient compression and query performance

Predicate Pushdown

Skip reading data that does not match query predicates

Column Projection

Read only the columns needed for a query

ACID Support

Full ACID transactional support when used with Apache Hive

Schema Evolution

Add, rename, and remove columns while preserving backward compatibility

Compression

Supports ZLIB, Snappy, LZO, LZ4, and ZSTD compression codecs

Use Cases

Hive Data Warehousing

Store Hive tables in highly efficient ORC format

Spark Analytics

Process large ORC datasets with Apache Spark SQL

Presto/Trino Queries

Fast analytical queries over ORC files with Presto or Trino

Data Lake Storage

Efficient columnar storage for data lake architectures

Integrations

Apache Hive

Native ORC support as default Hive storage format

Apache Spark

ORC data source support in Spark SQL

Presto/Trino

Fast ORC reading with native vectorized reader

Apache Flink

ORC file format support for batch and streaming

Apache Arrow

ORC to Arrow conversion for in-memory analytics

Semantic Vocabularies

Apache Orc Context

12 classes · 37 properties

JSON-LD

API Governance Rules

Apache ORC API Rules

8 rules · 4 errors 3 warnings 1 info

SPECTRAL

Resources

👥
GitHubOrganization
GitHubOrganization
🔗
Documentation
Documentation
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
NaftikoCapability
NaftikoCapability
🔗
JSON-LD
JSON-LD