Apache Giraph logo

Apache Giraph

Apache Giraph is an iterative graph processing system built for high scalability on Apache Hadoop. It is modeled after Google's Pregel and provides a simple yet flexible Java API for graph algorithms at massive scale using the Bulk Synchronous Parallel (BSP) model. Note - Apache Giraph has been retired as of 2024.

2 APIs 2 Capabilities 8 Features 45.8 / 100 developing
ApacheBig DataBSPGraph ProcessingHadoopOpen SourceRetired

API Rating

45.8/ 100
developing
Scored 2026-05-20 · rubric v0.3
Discoverability80.0
Contract Quality65.8
Governance39.5
Operational Transparency36.8
Developer Ergonomics19.6
Commercial Clarity39.5

APIs

Apache Giraph Job Monitoring API

Monitoring API for Apache Giraph graph processing jobs via the YARN ResourceManager REST API, providing job status, progress tracking, and cluster capacity metrics.

Apache Giraph Java API

Java API based on the Bulk Synchronous Parallel (BSP) model for implementing graph algorithms, with Vertex, Edge, and Master compute APIs for distributed graph processing on Had...

Capabilities

Apache Giraph Job Monitoring API — Cluster

Apache Giraph Job Monitoring API — Cluster. 1 operations. Lead operation: Apache Giraph Get Cluster Metrics. Self-contained Naftiko capability covering one Apache Giraph busines...

Run with Naftiko

Apache Giraph Job Monitoring API — Job Management

Apache Giraph Job Monitoring API — Job Management. 2 operations. Lead operation: Apache Giraph List Graph Processing Jobs. Self-contained Naftiko capability covering one Apache ...

Run with Naftiko

Features

Bulk Synchronous Parallel (BSP) Model

Google Pregel-inspired BSP computation model where vertices communicate through supersteps.

Vertex-Centric Programming

Write graph algorithms by defining per-vertex compute functions that exchange messages with neighbors.

Master Compute API

Global coordination API for aggregating results and controlling algorithm termination across supersteps.

Aggregators

Sharded aggregators for collecting global statistics across all vertices during computation.

Edge-Oriented Input

Flexible input formats for loading graphs from HDFS, Hive, Gora, and Rexster sources.

Out-of-Core Computation

Spill graph data to disk for processing graphs larger than available memory.

Hadoop Integration

Runs as a MapReduce job on Apache Hadoop YARN for resource management and fault tolerance.

Fault Tolerance

Checkpoint-based recovery for fault tolerance across superstep boundaries.

Use Cases

Social Graph Analysis

Analyze social network connections, communities, and influence at billions-of-vertices scale (as used at Facebook).

PageRank Computation

Compute web page or entity rankings using iterative link analysis algorithms.

Shortest Path Computation

Find shortest paths between vertices for network routing and recommendation problems.

Connected Components

Identify clusters and connected components in large graphs for community detection.

Graph Machine Learning Features

Generate graph-structural features for machine learning models at scale.

Integrations

Apache Hadoop

Runs on Hadoop YARN as a MapReduce application for cluster resource management.

Apache Hive

Hive I/O module for loading graph data from Hive tables.

Apache Gora

Gora I/O module for loading graph data from various NoSQL data stores.

Rexster

Rexster graph server I/O module for loading data from TinkerPop graph databases.

Apache HBase

HBase integration for storing and loading vertex and edge data.

Semantic Vocabularies

Apache Giraph Job Context

5 classes · 14 properties

JSON-LD

API Governance Rules

Apache Giraph API Rules

7 rules · 4 errors 3 warnings

SPECTRAL

Resources

🔗
Documentation
Documentation
🚀
GettingStarted
GettingStarted
👥
GitHubOrganization
GitHubOrganization
👥
GitHubRepository
GitHubRepository
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary

Sources

Raw ↑
aid: apache-giraph
name: Apache Giraph
description: Apache Giraph is an iterative graph processing system built for high scalability on Apache Hadoop. It is modeled
  after Google's Pregel and provides a simple yet flexible Java API for graph algorithms at massive scale using the Bulk Synchronous
  Parallel (BSP) model. Note - Apache Giraph has been retired as of 2024.
type: Index
position: Consumer
access: 3rd-Party
image: https://kinlane-productions2.s3.amazonaws.com/apis-json/apis-json-logo.jpg
tags:
- Apache
- Big Data
- BSP
- Graph Processing
- Hadoop
- Open Source
- Retired
created: '2026-03-16'
modified: '2026-05-19'
url: https://raw.githubusercontent.com/api-evangelist/apache-giraph/refs/heads/main/apis.yml
specificationVersion: '0.19'
apis:
- aid: apache-giraph:apache-giraph-job-api
  name: Apache Giraph Job Monitoring API
  description: Monitoring API for Apache Giraph graph processing jobs via the YARN ResourceManager REST API, providing job
    status, progress tracking, and cluster capacity metrics.
  humanURL: https://giraph.apache.org/quick_start.html
  baseURL: http://localhost:8088
  tags:
  - BSP
  - Graph
  - Hadoop
  - Job Management
  - YARN
  properties:
  - type: Documentation
    url: https://giraph.apache.org/quick_start.html
  - type: OpenAPI
    url: openapi/apache-giraph-job-openapi.yml
  - type: JSONSchema
    url: json-schema/giraph-job-application-info-schema.json
  - type: JSONLD
    url: json-ld/apache-giraph-job-context.jsonld
  - type: NaftikoCapability
    url: capabilities/job-cluster.yaml
  - type: NaftikoCapability
    url: capabilities/job-job-management.yaml
- aid: apache-giraph:apache-giraph-java-api
  name: Apache Giraph Java API
  description: Java API based on the Bulk Synchronous Parallel (BSP) model for implementing graph algorithms, with Vertex,
    Edge, and Master compute APIs for distributed graph processing on Hadoop.
  humanURL: https://giraph.apache.org/apidocs/
  tags:
  - BSP
  - Graph
  - Java
  - SDK
  properties:
  - type: Documentation
    url: https://giraph.apache.org/apidocs/
  - type: SDK
    url: https://search.maven.org/artifact/org.apache.giraph/giraph-core
    title: Java SDK (Maven Central)
common:
- type: Documentation
  url: https://giraph.apache.org/
- type: GettingStarted
  url: https://giraph.apache.org/quick_start.html
- type: GitHubOrganization
  url: https://github.com/apache
- type: GitHubRepository
  url: https://github.com/apache/giraph
- type: SpectralRules
  url: rules/apache-giraph-spectral-rules.yml
- type: Vocabulary
  url: vocabulary/apache-giraph-vocabulary.yaml
- type: Features
  data:
  - name: Bulk Synchronous Parallel (BSP) Model
    description: Google Pregel-inspired BSP computation model where vertices communicate through supersteps.
  - name: Vertex-Centric Programming
    description: Write graph algorithms by defining per-vertex compute functions that exchange messages with neighbors.
  - name: Master Compute API
    description: Global coordination API for aggregating results and controlling algorithm termination across supersteps.
  - name: Aggregators
    description: Sharded aggregators for collecting global statistics across all vertices during computation.
  - name: Edge-Oriented Input
    description: Flexible input formats for loading graphs from HDFS, Hive, Gora, and Rexster sources.
  - name: Out-of-Core Computation
    description: Spill graph data to disk for processing graphs larger than available memory.
  - name: Hadoop Integration
    description: Runs as a MapReduce job on Apache Hadoop YARN for resource management and fault tolerance.
  - name: Fault Tolerance
    description: Checkpoint-based recovery for fault tolerance across superstep boundaries.
- type: UseCases
  data:
  - name: Social Graph Analysis
    description: Analyze social network connections, communities, and influence at billions-of-vertices scale (as used at
      Facebook).
  - name: PageRank Computation
    description: Compute web page or entity rankings using iterative link analysis algorithms.
  - name: Shortest Path Computation
    description: Find shortest paths between vertices for network routing and recommendation problems.
  - name: Connected Components
    description: Identify clusters and connected components in large graphs for community detection.
  - name: Graph Machine Learning Features
    description: Generate graph-structural features for machine learning models at scale.
- type: Integrations
  data:
  - name: Apache Hadoop
    description: Runs on Hadoop YARN as a MapReduce application for cluster resource management.
  - name: Apache Hive
    description: Hive I/O module for loading graph data from Hive tables.
  - name: Apache Gora
    description: Gora I/O module for loading graph data from various NoSQL data stores.
  - name: Rexster
    description: Rexster graph server I/O module for loading data from TinkerPop graph databases.
  - name: Apache HBase
    description: HBase integration for storing and loading vertex and edge data.
maintainers:
- FN: Kin Lane
  email: [email protected]