Apache Arrow

Apache Arrow is a cross-language development platform for in-memory analytics developed by the Apache Software Foundation. It specifies a standardized, language-independent columnar memory format for flat and nested data, organized for efficient analytic operations on modern hardware including CPUs and GPUs. Arrow provides computational libraries in C++, Java, Python (PyArrow), R, Go, Rust, JavaScript, C#, Ruby, Julia, and Swift, along with zero-copy streaming messaging via IPC and a high-performance data transfer framework called Arrow Flight (built on gRPC).

3 APIs 10 Features

AnalyticsApacheColumnar FormatDatagRPCIn-MemoryIPCOpen SourcePython

APIs

Apache Arrow Flight RPC

Arrow Flight is a high-performance RPC framework built on gRPC for transferring large datasets using the Arrow columnar format. It enables efficient bulk data transport between ...

Apache Arrow Libraries

Arrow provides native libraries in C++, Java, Python (PyArrow), R, Go, Rust, JavaScript, C#, Ruby, Julia, and Swift for reading, writing, and processing columnar data in the Arr...

Apache Arrow Format Specification

The Apache Arrow columnar format specification defines the binary layout for in-memory columnar data, including the IPC format for streaming and file-based data exchange. It cov...

Features

Columnar In-Memory Format

Standardized language-independent columnar memory layout for efficient analytic operations with zero-copy access.

Arrow Flight RPC

High-performance gRPC-based framework for transferring large Arrow datasets between services with minimal serialization overhead.

Flight SQL

Extension of Arrow Flight providing a SQL query execution interface over the Arrow Flight protocol.

Zero-Copy IPC

Inter-process communication via shared memory and memory-mapped files, enabling zero-copy data sharing across process boundaries.

Multi-Language Support

Native libraries for C++, Java, Python, R, Go, Rust, JavaScript, C#, Ruby, Julia, and Swift.

Vectorized Computation

SIMD-optimized compute functions for analytical operations on Arrow arrays and tables.

Parquet Integration

First-class support for reading and writing Apache Parquet files via the Arrow columnar format.

Dataset API

Unified Dataset API for reading partitioned datasets from local filesystems, S3, GCS, and HDFS.

GPU Support

CUDA integration for zero-copy data sharing between CPU and GPU memory via the CUDA Arrow device.

Extension Types

Custom extension types for encoding domain-specific data using the Arrow format.

Use Cases

Analytics Data Exchange

Share large analytical datasets between Python, R, Java, and other runtimes without serialization overhead.

Database Query Results

Return query results from databases in Arrow format for fast analytics without Python/Java deserialization.

Data Pipeline Acceleration

Accelerate ETL and data processing pipelines using columnar computation and SIMD optimizations.

Machine Learning Feature Stores

Store and serve ML features in Arrow format for efficient batch and real-time feature retrieval.

High-Throughput Data Services

Build high-throughput data microservices using Arrow Flight for efficient bulk data transfer over gRPC.

Cross-Language Data Sharing

Share in-memory data between Python pandas/polars, Java, and Rust applications with zero-copy semantics.

Integrations

Apache Parquet

Native read/write support for Parquet columnar file format, the most common big data storage format.

Apache Spark

Spark uses Arrow for Python UDF execution and pandas-on-Spark operations via PyArrow.

pandas

Deep integration with pandas DataFrames via PyArrow's to_pandas() and from_pandas() conversions.

DuckDB

DuckDB uses Arrow as its primary in-memory data format for zero-copy query result exchange.

Polars

Polars DataFrame library is built on Arrow and supports zero-copy interop with Arrow arrays.

ADBC (Arrow Database Connectivity)

Arrow Database Connectivity provides an Arrow-native database driver interface analogous to ODBC/JDBC.

Delta Lake

Delta Lake integrates with Arrow for reading and writing Delta table data in columnar format.

Ray

Ray distributed computing framework uses Arrow for shared-memory object storage between workers.

Resources

GitHubOrganization

GitHubOrganization

GitHubRepository

GitHubRepository

GitHubRepository

GitHubRepository

GitHubRepository

GitHubRepository

GitHubRepository

GitHubRepository

GitHubRepository

GitHubRepository

Documentation

GettingStarted

TermsOfService

PyArrow (Python)

Apache Arrow Java (Maven)

Arrow-rs (Rust, crates.io)

Apache Arrow JavaScript (npm)

Sources

aid: apache-arrow
name: Apache Arrow
description: >-
  Apache Arrow is a cross-language development platform for in-memory analytics developed by the Apache Software Foundation. It specifies a standardized, language-independent columnar memory format for flat and nested data, organized for efficient analytic operations on modern hardware including CPUs and GPUs. Arrow provides computational libraries in C++, Java, Python (PyArrow), R, Go, Rust, JavaScript, C#, Ruby, Julia, and Swift, along with zero-copy streaming messaging via IPC and a high-performance data transfer framework called Arrow Flight (built on gRPC).
type: Index
position: Consumer
access: 3rd-Party
image: https://kinlane-productions2.s3.amazonaws.com/apis-json/apis-json-logo.jpg
tags:
  - Analytics
  - Apache
  - Columnar Format
  - Data
  - gRPC
  - In-Memory
  - IPC
  - Open Source
  - Python
created: '2026-03-16'
modified: '2026-04-19'
url: >-
  https://raw.githubusercontent.com/api-evangelist/apache-arrow/refs/heads/main/apis.yml
specificationVersion: '0.19'
apis:
  - aid: apache-arrow:apache-arrow-flight-rpc
    name: Apache Arrow Flight RPC
    description: >-
      Arrow Flight is a high-performance RPC framework built on gRPC for transferring large datasets using the Arrow columnar format. It enables efficient bulk data transport between services with client libraries available in C++, Java, Python, R, Go, and Rust. Flight SQL extends Flight with a SQL-over-Arrow interface for database query execution.
    humanURL: https://arrow.apache.org/docs/format/Flight.html
    tags:
      - Data Transfer
      - gRPC
      - RPC
    properties:
      - type: Documentation
        url: https://arrow.apache.org/docs/format/Flight.html
      - type: APIReference
        url: https://arrow.apache.org/docs/format/FlightSql.html
  - aid: apache-arrow:apache-arrow-libraries
    name: Apache Arrow Libraries
    description: >-
      Arrow provides native libraries in C++, Java, Python (PyArrow), R, Go, Rust, JavaScript, C#, Ruby, Julia, and Swift for reading, writing, and processing columnar data in the Arrow in-memory format. Libraries enable zero-copy data sharing between processes and language runtimes.
    humanURL: https://arrow.apache.org/docs/
    tags:
      - Data Processing
      - Libraries
      - SDK
    properties:
      - type: Documentation
        url: https://arrow.apache.org/docs/
      - type: APIReference
        url: https://arrow.apache.org/docs/python/api.html
  - aid: apache-arrow:apache-arrow-format
    name: Apache Arrow Format Specification
    description: >-
      The Apache Arrow columnar format specification defines the binary layout for in-memory columnar data, including the IPC format for streaming and file-based data exchange. It covers flat arrays, nested structures, dictionaries, and extension types.
    humanURL: https://arrow.apache.org/docs/format/Columnar.html
    tags:
      - Format
      - IPC
      - Specification
    properties:
      - type: Documentation
        url: https://arrow.apache.org/docs/format/Columnar.html
      - type: Specification
        url: https://arrow.apache.org/docs/format/Versioning.html
common:
  - type: LinkedIn
    url: https://www.linkedin.com/company/apache-arrow
  - type: GitHubOrganization
    url: https://github.com/apache
  - type: GitHubRepository
    url: https://github.com/apache/arrow
  - type: GitHubRepository
    url: https://github.com/apache/arrow-rs
  - type: GitHubRepository
    url: https://github.com/apache/arrow-java
  - type: GitHubRepository
    url: https://github.com/apache/arrow-go
  - type: GitHubRepository
    url: https://github.com/apache/arrow-js
  - type: Documentation
    url: https://arrow.apache.org/
  - type: GettingStarted
    url: https://arrow.apache.org/docs/python/getstarted.html
  - type: Support
    url: https://arrow.apache.org/community/
  - type: TermsOfService
    url: https://www.apache.org/licenses/
  - type: ChangeLog
    url: https://arrow.apache.org/blog/
  - type: SDK
    url: https://pypi.org/project/pyarrow/
    title: PyArrow (Python)
  - type: SDK
    url: https://search.maven.org/artifact/org.apache.arrow/arrow-vector
    title: Apache Arrow Java (Maven)
  - type: SDK
    url: https://crates.io/crates/arrow
    title: Arrow-rs (Rust, crates.io)
  - type: SDK
    url: https://pkg.go.dev/github.com/apache/arrow/go/v15
    title: Arrow Go
  - type: SDK
    url: https://www.npmjs.com/package/apache-arrow
    title: Apache Arrow JavaScript (npm)
  - type: Features
    data:
      - name: Columnar In-Memory Format
        description: Standardized language-independent columnar memory layout for efficient analytic operations with zero-copy access.
      - name: Arrow Flight RPC
        description: High-performance gRPC-based framework for transferring large Arrow datasets between services with minimal serialization overhead.
      - name: Flight SQL
        description: Extension of Arrow Flight providing a SQL query execution interface over the Arrow Flight protocol.
      - name: Zero-Copy IPC
        description: Inter-process communication via shared memory and memory-mapped files, enabling zero-copy data sharing across process boundaries.
      - name: Multi-Language Support
        description: Native libraries for C++, Java, Python, R, Go, Rust, JavaScript, C#, Ruby, Julia, and Swift.
      - name: Vectorized Computation
        description: SIMD-optimized compute functions for analytical operations on Arrow arrays and tables.
      - name: Parquet Integration
        description: First-class support for reading and writing Apache Parquet files via the Arrow columnar format.
      - name: Dataset API
        description: Unified Dataset API for reading partitioned datasets from local filesystems, S3, GCS, and HDFS.
      - name: GPU Support
        description: CUDA integration for zero-copy data sharing between CPU and GPU memory via the CUDA Arrow device.
      - name: Extension Types
        description: Custom extension types for encoding domain-specific data using the Arrow format.
  - type: UseCases
    data:
      - name: Analytics Data Exchange
        description: Share large analytical datasets between Python, R, Java, and other runtimes without serialization overhead.
      - name: Database Query Results
        description: Return query results from databases in Arrow format for fast analytics without Python/Java deserialization.
      - name: Data Pipeline Acceleration
        description: Accelerate ETL and data processing pipelines using columnar computation and SIMD optimizations.
      - name: Machine Learning Feature Stores
        description: Store and serve ML features in Arrow format for efficient batch and real-time feature retrieval.
      - name: High-Throughput Data Services
        description: Build high-throughput data microservices using Arrow Flight for efficient bulk data transfer over gRPC.
      - name: Cross-Language Data Sharing
        description: Share in-memory data between Python pandas/polars, Java, and Rust applications with zero-copy semantics.
  - type: Integrations
    data:
      - name: Apache Parquet
        description: Native read/write support for Parquet columnar file format, the most common big data storage format.
      - name: Apache Spark
        description: Spark uses Arrow for Python UDF execution and pandas-on-Spark operations via PyArrow.
      - name: pandas
        description: Deep integration with pandas DataFrames via PyArrow's to_pandas() and from_pandas() conversions.
      - name: DuckDB
        description: DuckDB uses Arrow as its primary in-memory data format for zero-copy query result exchange.
      - name: Polars
        description: Polars DataFrame library is built on Arrow and supports zero-copy interop with Arrow arrays.
      - name: ADBC (Arrow Database Connectivity)
        description: Arrow Database Connectivity provides an Arrow-native database driver interface analogous to ODBC/JDBC.
      - name: Delta Lake
        description: Delta Lake integrates with Arrow for reading and writing Delta table data in columnar format.
      - name: Ray
        description: Ray distributed computing framework uses Arrow for shared-memory object storage between workers.
maintainers:
  - FN: Kin Lane
    email: [email protected]