Apache ORC
Apache ORC is a self-describing, type-aware columnar file format designed for Hadoop workloads. It provides high compression ratios and fast read performance for large-scale data processing with support for complex data types.
API Rating
APIs
Apache ORC
ORC provides Java and C++ APIs for reading and writing ORC columnar files, with support for predicate pushdown, column projection, compression codecs, and integration with Hive,...
Capabilities
Apache ORC Tools API — Conversion
Apache ORC Tools API — Conversion. 1 operations. Lead operation: Apache ORC Convert File to ORC. Self-contained Naftiko capability covering one Apache Orc business surface.
Run with NaftikoApache ORC Tools API — Files
Apache ORC Tools API — Files. 4 operations. Lead operation: Apache ORC List ORC Files. Self-contained Naftiko capability covering one Apache Orc business surface.
Run with NaftikoApache ORC Tools API — Operations
Apache ORC Tools API — Operations. 1 operations. Lead operation: Apache ORC Merge ORC Files. Self-contained Naftiko capability covering one Apache Orc business surface.
Run with NaftikoFeatures
Stores data by column for efficient compression and query performance
Skip reading data that does not match query predicates
Read only the columns needed for a query
Full ACID transactional support when used with Apache Hive
Add, rename, and remove columns while preserving backward compatibility
Supports ZLIB, Snappy, LZO, LZ4, and ZSTD compression codecs
Use Cases
Store Hive tables in highly efficient ORC format
Process large ORC datasets with Apache Spark SQL
Fast analytical queries over ORC files with Presto or Trino
Efficient columnar storage for data lake architectures
Integrations
Native ORC support as default Hive storage format
ORC data source support in Spark SQL
Fast ORC reading with native vectorized reader
ORC file format support for batch and streaming
ORC to Arrow conversion for in-memory analytics