Apache Giraph
Apache Giraph is an iterative graph processing system built for high scalability on Apache Hadoop. It is modeled after Google's Pregel and provides a simple yet flexible Java API for graph algorithms at massive scale using the Bulk Synchronous Parallel (BSP) model. Note - Apache Giraph has been retired as of 2024.
APIs
Apache Giraph Job Monitoring API
Monitoring API for Apache Giraph graph processing jobs via the YARN ResourceManager REST API, providing job status, progress tracking, and cluster capacity metrics.
Apache Giraph Java API
Java API based on the Bulk Synchronous Parallel (BSP) model for implementing graph algorithms, with Vertex, Edge, and Master compute APIs for distributed graph processing on Had...
Capabilities
Apache Giraph Graph Processing
Capability for monitoring Apache Giraph graph processing jobs on Hadoop YARN — tracking job status, completion progress, and cluster capacity. Designed for data engineers runnin...
Run with NaftikoFeatures
Google Pregel-inspired BSP computation model where vertices communicate through supersteps.
Write graph algorithms by defining per-vertex compute functions that exchange messages with neighbors.
Global coordination API for aggregating results and controlling algorithm termination across supersteps.
Sharded aggregators for collecting global statistics across all vertices during computation.
Flexible input formats for loading graphs from HDFS, Hive, Gora, and Rexster sources.
Spill graph data to disk for processing graphs larger than available memory.
Runs as a MapReduce job on Apache Hadoop YARN for resource management and fault tolerance.
Checkpoint-based recovery for fault tolerance across superstep boundaries.
Use Cases
Analyze social network connections, communities, and influence at billions-of-vertices scale (as used at Facebook).
Compute web page or entity rankings using iterative link analysis algorithms.
Find shortest paths between vertices for network routing and recommendation problems.
Identify clusters and connected components in large graphs for community detection.
Generate graph-structural features for machine learning models at scale.
Integrations
Runs on Hadoop YARN as a MapReduce application for cluster resource management.
Hive I/O module for loading graph data from Hive tables.
Gora I/O module for loading graph data from various NoSQL data stores.
Rexster graph server I/O module for loading data from TinkerPop graph databases.
HBase integration for storing and loading vertex and edge data.