Apache Arrow
Apache Arrow is a cross-language development platform for in-memory analytics developed by the Apache Software Foundation. It specifies a standardized, language-independent columnar memory format for flat and nested data, organized for efficient analytic operations on modern hardware including CPUs and GPUs. Arrow provides computational libraries in C++, Java, Python (PyArrow), R, Go, Rust, JavaScript, C#, Ruby, Julia, and Swift, along with zero-copy streaming messaging via IPC and a high-performance data transfer framework called Arrow Flight (built on gRPC).
3 APIs
10 Features
AnalyticsApacheColumnar FormatDatagRPCIn-MemoryIPCOpen SourcePython
Arrow Flight is a high-performance RPC framework built on gRPC for transferring large datasets using the Arrow columnar format. It enables efficient bulk data transport between ...
Arrow provides native libraries in C++, Java, Python (PyArrow), R, Go, Rust, JavaScript, C#, Ruby, Julia, and Swift for reading, writing, and processing columnar data in the Arr...
The Apache Arrow columnar format specification defines the binary layout for in-memory columnar data, including the IPC format for streaming and file-based data exchange. It cov...
Columnar In-Memory Format
Standardized language-independent columnar memory layout for efficient analytic operations with zero-copy access.
Arrow Flight RPC
High-performance gRPC-based framework for transferring large Arrow datasets between services with minimal serialization overhead.
Flight SQL
Extension of Arrow Flight providing a SQL query execution interface over the Arrow Flight protocol.
Zero-Copy IPC
Inter-process communication via shared memory and memory-mapped files, enabling zero-copy data sharing across process boundaries.
Multi-Language Support
Native libraries for C++, Java, Python, R, Go, Rust, JavaScript, C#, Ruby, Julia, and Swift.
Vectorized Computation
SIMD-optimized compute functions for analytical operations on Arrow arrays and tables.
Parquet Integration
First-class support for reading and writing Apache Parquet files via the Arrow columnar format.
Dataset API
Unified Dataset API for reading partitioned datasets from local filesystems, S3, GCS, and HDFS.
GPU Support
CUDA integration for zero-copy data sharing between CPU and GPU memory via the CUDA Arrow device.
Extension Types
Custom extension types for encoding domain-specific data using the Arrow format.
Analytics Data Exchange
Share large analytical datasets between Python, R, Java, and other runtimes without serialization overhead.
Database Query Results
Return query results from databases in Arrow format for fast analytics without Python/Java deserialization.
Data Pipeline Acceleration
Accelerate ETL and data processing pipelines using columnar computation and SIMD optimizations.
Machine Learning Feature Stores
Store and serve ML features in Arrow format for efficient batch and real-time feature retrieval.
High-Throughput Data Services
Build high-throughput data microservices using Arrow Flight for efficient bulk data transfer over gRPC.
Cross-Language Data Sharing
Share in-memory data between Python pandas/polars, Java, and Rust applications with zero-copy semantics.
Apache Parquet
Native read/write support for Parquet columnar file format, the most common big data storage format.
Apache Spark
Spark uses Arrow for Python UDF execution and pandas-on-Spark operations via PyArrow.
pandas
Deep integration with pandas DataFrames via PyArrow's to_pandas() and from_pandas() conversions.
DuckDB
DuckDB uses Arrow as its primary in-memory data format for zero-copy query result exchange.
Polars
Polars DataFrame library is built on Arrow and supports zero-copy interop with Arrow arrays.
ADBC (Arrow Database Connectivity)
Arrow Database Connectivity provides an Arrow-native database driver interface analogous to ODBC/JDBC.
Delta Lake
Delta Lake integrates with Arrow for reading and writing Delta table data in columnar format.
Ray
Ray distributed computing framework uses Arrow for shared-memory object storage between workers.
aid: apache-arrow
name: Apache Arrow
description: >-
Apache Arrow is a cross-language development platform for in-memory analytics developed by the Apache Software Foundation. It specifies a standardized, language-independent columnar memory format for flat and nested data, organized for efficient analytic operations on modern hardware including CPUs and GPUs. Arrow provides computational libraries in C++, Java, Python (PyArrow), R, Go, Rust, JavaScript, C#, Ruby, Julia, and Swift, along with zero-copy streaming messaging via IPC and a high-performance data transfer framework called Arrow Flight (built on gRPC).
type: Index
position: Consumer
access: 3rd-Party
image: https://kinlane-productions2.s3.amazonaws.com/apis-json/apis-json-logo.jpg
tags:
- Analytics
- Apache
- Columnar Format
- Data
- gRPC
- In-Memory
- IPC
- Open Source
- Python
created: '2026-03-16'
modified: '2026-04-19'
url: >-
https://raw.githubusercontent.com/api-evangelist/apache-arrow/refs/heads/main/apis.yml
specificationVersion: '0.19'
apis:
- aid: apache-arrow:apache-arrow-flight-rpc
name: Apache Arrow Flight RPC
description: >-
Arrow Flight is a high-performance RPC framework built on gRPC for transferring large datasets using the Arrow columnar format. It enables efficient bulk data transport between services with client libraries available in C++, Java, Python, R, Go, and Rust. Flight SQL extends Flight with a SQL-over-Arrow interface for database query execution.
humanURL: https://arrow.apache.org/docs/format/Flight.html
tags:
- Data Transfer
- gRPC
- RPC
properties:
- type: Documentation
url: https://arrow.apache.org/docs/format/Flight.html
- type: APIReference
url: https://arrow.apache.org/docs/format/FlightSql.html
- aid: apache-arrow:apache-arrow-libraries
name: Apache Arrow Libraries
description: >-
Arrow provides native libraries in C++, Java, Python (PyArrow), R, Go, Rust, JavaScript, C#, Ruby, Julia, and Swift for reading, writing, and processing columnar data in the Arrow in-memory format. Libraries enable zero-copy data sharing between processes and language runtimes.
humanURL: https://arrow.apache.org/docs/
tags:
- Data Processing
- Libraries
- SDK
properties:
- type: Documentation
url: https://arrow.apache.org/docs/
- type: APIReference
url: https://arrow.apache.org/docs/python/api.html
- aid: apache-arrow:apache-arrow-format
name: Apache Arrow Format Specification
description: >-
The Apache Arrow columnar format specification defines the binary layout for in-memory columnar data, including the IPC format for streaming and file-based data exchange. It covers flat arrays, nested structures, dictionaries, and extension types.
humanURL: https://arrow.apache.org/docs/format/Columnar.html
tags:
- Format
- IPC
- Specification
properties:
- type: Documentation
url: https://arrow.apache.org/docs/format/Columnar.html
- type: Specification
url: https://arrow.apache.org/docs/format/Versioning.html
common:
- type: GitHubOrganization
url: https://github.com/apache
- type: GitHubRepository
url: https://github.com/apache/arrow
- type: GitHubRepository
url: https://github.com/apache/arrow-rs
- type: GitHubRepository
url: https://github.com/apache/arrow-java
- type: GitHubRepository
url: https://github.com/apache/arrow-go
- type: GitHubRepository
url: https://github.com/apache/arrow-js
- type: Documentation
url: https://arrow.apache.org/
- type: GettingStarted
url: https://arrow.apache.org/docs/python/getstarted.html
- type: Support
url: https://arrow.apache.org/community/
- type: TermsOfService
url: https://www.apache.org/licenses/
- type: ChangeLog
url: https://arrow.apache.org/blog/
- type: SDK
url: https://pypi.org/project/pyarrow/
title: PyArrow (Python)
- type: SDK
url: https://search.maven.org/artifact/org.apache.arrow/arrow-vector
title: Apache Arrow Java (Maven)
- type: SDK
url: https://crates.io/crates/arrow
title: Arrow-rs (Rust, crates.io)
- type: SDK
url: https://pkg.go.dev/github.com/apache/arrow/go/v15
title: Arrow Go
- type: SDK
url: https://www.npmjs.com/package/apache-arrow
title: Apache Arrow JavaScript (npm)
- type: Features
data:
- name: Columnar In-Memory Format
description: Standardized language-independent columnar memory layout for efficient analytic operations with zero-copy access.
- name: Arrow Flight RPC
description: High-performance gRPC-based framework for transferring large Arrow datasets between services with minimal serialization overhead.
- name: Flight SQL
description: Extension of Arrow Flight providing a SQL query execution interface over the Arrow Flight protocol.
- name: Zero-Copy IPC
description: Inter-process communication via shared memory and memory-mapped files, enabling zero-copy data sharing across process boundaries.
- name: Multi-Language Support
description: Native libraries for C++, Java, Python, R, Go, Rust, JavaScript, C#, Ruby, Julia, and Swift.
- name: Vectorized Computation
description: SIMD-optimized compute functions for analytical operations on Arrow arrays and tables.
- name: Parquet Integration
description: First-class support for reading and writing Apache Parquet files via the Arrow columnar format.
- name: Dataset API
description: Unified Dataset API for reading partitioned datasets from local filesystems, S3, GCS, and HDFS.
- name: GPU Support
description: CUDA integration for zero-copy data sharing between CPU and GPU memory via the CUDA Arrow device.
- name: Extension Types
description: Custom extension types for encoding domain-specific data using the Arrow format.
- type: UseCases
data:
- name: Analytics Data Exchange
description: Share large analytical datasets between Python, R, Java, and other runtimes without serialization overhead.
- name: Database Query Results
description: Return query results from databases in Arrow format for fast analytics without Python/Java deserialization.
- name: Data Pipeline Acceleration
description: Accelerate ETL and data processing pipelines using columnar computation and SIMD optimizations.
- name: Machine Learning Feature Stores
description: Store and serve ML features in Arrow format for efficient batch and real-time feature retrieval.
- name: High-Throughput Data Services
description: Build high-throughput data microservices using Arrow Flight for efficient bulk data transfer over gRPC.
- name: Cross-Language Data Sharing
description: Share in-memory data between Python pandas/polars, Java, and Rust applications with zero-copy semantics.
- type: Integrations
data:
- name: Apache Parquet
description: Native read/write support for Parquet columnar file format, the most common big data storage format.
- name: Apache Spark
description: Spark uses Arrow for Python UDF execution and pandas-on-Spark operations via PyArrow.
- name: pandas
description: Deep integration with pandas DataFrames via PyArrow's to_pandas() and from_pandas() conversions.
- name: DuckDB
description: DuckDB uses Arrow as its primary in-memory data format for zero-copy query result exchange.
- name: Polars
description: Polars DataFrame library is built on Arrow and supports zero-copy interop with Arrow arrays.
- name: ADBC (Arrow Database Connectivity)
description: Arrow Database Connectivity provides an Arrow-native database driver interface analogous to ODBC/JDBC.
- name: Delta Lake
description: Delta Lake integrates with Arrow for reading and writing Delta table data in columnar format.
- name: Ray
description: Ray distributed computing framework uses Arrow for shared-memory object storage between workers.
maintainers:
- FN: Kin Lane
email: [email protected]