Apache Giraph logo

Apache Giraph

Apache Giraph is an iterative graph processing system built for high scalability on Apache Hadoop. It is modeled after Google's Pregel and provides a simple yet flexible Java API for graph algorithms at massive scale using the Bulk Synchronous Parallel (BSP) model. Note - Apache Giraph has been retired as of 2024.

2 APIs 1 Capabilities 8 Features
ApacheBig DataBSPGraph ProcessingHadoopOpen SourceRetired

APIs

Apache Giraph Job Monitoring API

Monitoring API for Apache Giraph graph processing jobs via the YARN ResourceManager REST API, providing job status, progress tracking, and cluster capacity metrics.

Apache Giraph Java API

Java API based on the Bulk Synchronous Parallel (BSP) model for implementing graph algorithms, with Vertex, Edge, and Master compute APIs for distributed graph processing on Had...

Capabilities

Apache Giraph Graph Processing

Capability for monitoring Apache Giraph graph processing jobs on Hadoop YARN — tracking job status, completion progress, and cluster capacity. Designed for data engineers runnin...

Run with Naftiko

Features

Bulk Synchronous Parallel (BSP) Model

Google Pregel-inspired BSP computation model where vertices communicate through supersteps.

Vertex-Centric Programming

Write graph algorithms by defining per-vertex compute functions that exchange messages with neighbors.

Master Compute API

Global coordination API for aggregating results and controlling algorithm termination across supersteps.

Aggregators

Sharded aggregators for collecting global statistics across all vertices during computation.

Edge-Oriented Input

Flexible input formats for loading graphs from HDFS, Hive, Gora, and Rexster sources.

Out-of-Core Computation

Spill graph data to disk for processing graphs larger than available memory.

Hadoop Integration

Runs as a MapReduce job on Apache Hadoop YARN for resource management and fault tolerance.

Fault Tolerance

Checkpoint-based recovery for fault tolerance across superstep boundaries.

Use Cases

Social Graph Analysis

Analyze social network connections, communities, and influence at billions-of-vertices scale (as used at Facebook).

PageRank Computation

Compute web page or entity rankings using iterative link analysis algorithms.

Shortest Path Computation

Find shortest paths between vertices for network routing and recommendation problems.

Connected Components

Identify clusters and connected components in large graphs for community detection.

Graph Machine Learning Features

Generate graph-structural features for machine learning models at scale.

Integrations

Apache Hadoop

Runs on Hadoop YARN as a MapReduce application for cluster resource management.

Apache Hive

Hive I/O module for loading graph data from Hive tables.

Apache Gora

Gora I/O module for loading graph data from various NoSQL data stores.

Rexster

Rexster graph server I/O module for loading data from TinkerPop graph databases.

Apache HBase

HBase integration for storing and loading vertex and edge data.

Semantic Vocabularies

Apache Giraph Job Context

5 classes · 14 properties

JSON-LD

API Governance Rules

Apache Giraph API Rules

7 rules · 4 errors 3 warnings

SPECTRAL

Resources

🔗
Documentation
Documentation
🚀
GettingStarted
GettingStarted
👥
GitHubOrganization
GitHubOrganization
👥
GitHubRepository
GitHubRepository
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
NaftikoCapability
NaftikoCapability