Apache ORC logo

Apache ORC

Apache ORC is a self-describing, type-aware columnar file format designed for Hadoop workloads. It provides high compression ratios and fast read performance for large-scale data processing with support for complex data types.

1 APIs 3 Capabilities 6 Features 43.9 / 100 thin
Big DataColumnar StorageCompressionFile FormatHadoopApacheOpen Source

API Rating

43.9/ 100
thin
Scored 2026-05-20 · rubric v0.3
Discoverability80.0
Contract Quality63.2
Governance47.4
Operational Transparency36.8
Developer Ergonomics8.7
Commercial Clarity39.5

APIs

Apache ORC

ORC provides Java and C++ APIs for reading and writing ORC columnar files, with support for predicate pushdown, column projection, compression codecs, and integration with Hive,...

Capabilities

Apache ORC Tools API — Conversion

Apache ORC Tools API — Conversion. 1 operations. Lead operation: Apache ORC Convert File to ORC. Self-contained Naftiko capability covering one Apache Orc business surface.

Run with Naftiko

Apache ORC Tools API — Files

Apache ORC Tools API — Files. 4 operations. Lead operation: Apache ORC List ORC Files. Self-contained Naftiko capability covering one Apache Orc business surface.

Run with Naftiko

Apache ORC Tools API — Operations

Apache ORC Tools API — Operations. 1 operations. Lead operation: Apache ORC Merge ORC Files. Self-contained Naftiko capability covering one Apache Orc business surface.

Run with Naftiko

Features

Columnar Storage

Stores data by column for efficient compression and query performance

Predicate Pushdown

Skip reading data that does not match query predicates

Column Projection

Read only the columns needed for a query

ACID Support

Full ACID transactional support when used with Apache Hive

Schema Evolution

Add, rename, and remove columns while preserving backward compatibility

Compression

Supports ZLIB, Snappy, LZO, LZ4, and ZSTD compression codecs

Use Cases

Hive Data Warehousing

Store Hive tables in highly efficient ORC format

Spark Analytics

Process large ORC datasets with Apache Spark SQL

Presto/Trino Queries

Fast analytical queries over ORC files with Presto or Trino

Data Lake Storage

Efficient columnar storage for data lake architectures

Integrations

Apache Hive

Native ORC support as default Hive storage format

Apache Spark

ORC data source support in Spark SQL

Presto/Trino

Fast ORC reading with native vectorized reader

Apache Flink

ORC file format support for batch and streaming

Apache Arrow

ORC to Arrow conversion for in-memory analytics

Semantic Vocabularies

Apache Orc Context

12 classes · 37 properties

JSON-LD

API Governance Rules

Apache ORC API Rules

8 rules · 4 errors 3 warnings 1 info

SPECTRAL

Resources

👥
GitHubOrganization
GitHubOrganization
🔗
Documentation
Documentation
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
JSONLD
JSONLD

Sources

Raw ↑
aid: apache-orc
name: Apache ORC
description: Apache ORC is a self-describing, type-aware columnar file format designed for Hadoop workloads. It provides high
  compression ratios and fast read performance for large-scale data processing with support for complex data types.
type: Index
position: Consumer
access: 3rd-Party
image: https://kinlane-productions2.s3.amazonaws.com/apis-json/apis-json-logo.jpg
tags:
- Big Data
- Columnar Storage
- Compression
- File Format
- Hadoop
- Apache
- Open Source
created: '2026-03-16'
modified: '2026-05-19'
url: https://raw.githubusercontent.com/api-evangelist/apache-orc/refs/heads/main/apis.yml
specificationVersion: '0.19'
apis:
- aid: apache-orc:apache-orc
  name: Apache ORC
  description: ORC provides Java and C++ APIs for reading and writing ORC columnar files, with support for predicate pushdown,
    column projection, compression codecs, and integration with Hive, Spark, Presto, and other query engines.
  humanURL: https://orc.apache.org/docs/
  tags:
  - C++
  - Columnar Format
  - Java
  - Apache
  - Open Source
  - Big Data
  properties:
  - type: Documentation
    url: https://orc.apache.org/docs/
  - type: OpenAPI
    url: openapi/apache-orc-tools-api.yaml
  - type: NaftikoCapability
    url: capabilities/tools-conversion.yaml
  - type: NaftikoCapability
    url: capabilities/tools-files.yaml
  - type: NaftikoCapability
    url: capabilities/tools-operations.yaml
maintainers:
- FN: Kin Lane
  email: [email protected]
common:
- type: GitHubOrganization
  url: https://github.com/apache/orc
- type: Documentation
  url: https://orc.apache.org/
- type: SpectralRules
  url: rules/apache-orc-spectral-rules.yml
- type: Vocabulary
  url: vocabulary/apache-orc-vocabulary.yaml
- type: JSONLD
  url: json-ld/apache-orc-context.jsonld
- type: Features
  data:
  - name: Columnar Storage
    description: Stores data by column for efficient compression and query performance
  - name: Predicate Pushdown
    description: Skip reading data that does not match query predicates
  - name: Column Projection
    description: Read only the columns needed for a query
  - name: ACID Support
    description: Full ACID transactional support when used with Apache Hive
  - name: Schema Evolution
    description: Add, rename, and remove columns while preserving backward compatibility
  - name: Compression
    description: Supports ZLIB, Snappy, LZO, LZ4, and ZSTD compression codecs
- type: UseCases
  data:
  - name: Hive Data Warehousing
    description: Store Hive tables in highly efficient ORC format
  - name: Spark Analytics
    description: Process large ORC datasets with Apache Spark SQL
  - name: Presto/Trino Queries
    description: Fast analytical queries over ORC files with Presto or Trino
  - name: Data Lake Storage
    description: Efficient columnar storage for data lake architectures
- type: Integrations
  data:
  - name: Apache Hive
    description: Native ORC support as default Hive storage format
  - name: Apache Spark
    description: ORC data source support in Spark SQL
  - name: Presto/Trino
    description: Fast ORC reading with native vectorized reader
  - name: Apache Flink
    description: ORC file format support for batch and streaming
  - name: Apache Arrow
    description: ORC to Arrow conversion for in-memory analytics