Apache Helix logo

Apache Helix

Apache Helix is a generic cluster management framework for partitioned and replicated distributed resources. It automates partition management, replication, fault tolerance, and cluster expansion for distributed systems, providing a REST API for cluster administration and a Java API for participant, spectator, and controller roles.

2 APIs 1 Capabilities 8 Features
ApacheCluster ManagementDistributed SystemsOpen SourcePartitioningReplication

APIs

Apache Helix REST API

REST API for managing Apache Helix clusters, instances, resources, and partition state assignments, including ideal state queries and external view inspection.

Apache Helix Java API

Java API for implementing Helix participant, spectator, and controller roles, with APIs for resource management, task execution, and state machine definitions.

Capabilities

Features

Automatic Partition Management

Automatically assign and balance partitions across cluster nodes using pluggable rebalancer algorithms.

State Machine Framework

Define custom resource state machines (e.g., Master-Slave, Leader-Standby) for any distributed service.

Fault Tolerance

Detect node failures and automatically reassign partitions to maintain replication targets.

REST API

HTTP REST API for cluster administration, resource management, and state inspection.

Task Framework

Distributed task scheduling framework for batch jobs and recurring workflows with failure handling.

ZooKeeper Integration

Uses Apache ZooKeeper as the distributed coordination backend for cluster state storage.

Spectator API

Read-only API for external services to observe resource state and routing decisions.

Cloud-Aware Rebalancing

Rack and zone-aware partition placement for fault-domain isolation in cloud environments.

Use Cases

Distributed Database Cluster Management

Manage shard assignment and replication for distributed databases like DistributedLog or Espresso.

Search Index Partition Management

Automatically balance and assign search index shards across a cluster of query servers.

Distributed Task Scheduling

Schedule and execute distributed batch tasks with automatic retry and failure recovery.

Microservices Load Balancing

Use Helix spectator API to implement client-side load balancing based on partition state.

Stateful Service Migration

Perform rolling upgrades and partition migrations without service downtime.

Integrations

Apache ZooKeeper

ZooKeeper is the required coordination backend for Helix cluster state management.

Apache Kafka

Helix is used internally by some Kafka ecosystem projects for partition management.

LinkedIn Pinot

Apache Pinot uses Helix for real-time OLAP cluster partition and segment management.

LinkedIn Venice

Venice feature store uses Helix for managing data store partition assignments.

Semantic Vocabularies

Apache Helix Rest Context

16 classes · 0 properties

JSON-LD

API Governance Rules

Apache Helix API Rules

8 rules · 2 errors 5 warnings 1 info

SPECTRAL

Resources

🔗
Documentation
Documentation
🚀
GettingStarted
GettingStarted
👥
GitHubOrganization
GitHubOrganization
👥
GitHubRepository
GitHubRepository
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
NaftikoCapability
NaftikoCapability