Apache Oozie logo

Apache Oozie

Apache Oozie is a workflow scheduler system for managing Apache Hadoop jobs. It enables users to define workflows as directed acyclic graphs (DAGs) of actions including MapReduce, Pig, Hive, Sqoop, and custom Java/shell steps. Coordinator jobs trigger workflows based on time schedules or data availability, while bundle jobs group multiple coordinators. Oozie provides a REST API for job submission, lifecycle management, monitoring, and system administration. Governed by the Apache Software Foundation under the Apache License 2.0, written in Java.

1 APIs 1 Capabilities 10 Features
WorkflowHadoopOrchestrationSchedulingBig DataApacheJavaOpen Source

APIs

Apache Oozie REST API

The Oozie Web Services API provides REST endpoints for submitting, managing, and monitoring workflow, coordinator, and bundle jobs on Apache Hadoop. Supports full job lifecycle ...

Capabilities

Apache Oozie Workflow Orchestration

Workflow capability for orchestrating Hadoop data processing pipelines using Apache Oozie. Covers workflow, coordinator, and bundle job lifecycle management for data engineers a...

Run with Naftiko

Features

Directed Acyclic Graph Workflows

Define complex data processing pipelines as DAGs of actions executed on Apache Hadoop.

Coordinator Jobs

Schedule recurring workflows triggered by time intervals or data availability conditions in HDFS.

Bundle Jobs

Group multiple coordinator jobs into a single bundle for coordinated lifecycle management.

REST API Management

Full REST API for job submission, lifecycle control, monitoring, and system administration.

Native Hadoop Action Types

Built-in support for MapReduce, Pig, Hive, Sqoop, Distcp, and custom Java/shell actions.

SLA Management

Define and monitor service level agreements on workflow and coordinator actions with alert capabilities.

DAG Visualization

Generate PNG, SVG, or DOT graph visualizations of workflow DAGs for debugging and documentation.

Log Retrieval

Retrieve execution logs, error logs, and audit trails for jobs via REST API with filtering support.

High Availability

Built-in HA support with multiple Oozie server instances and distributed state management.

Shared Library Support

Manage shared Hadoop libraries across workflows for consistent classpath management.

Use Cases

ETL Pipeline Orchestration

Orchestrate multi-step ETL pipelines combining Hive queries, MapReduce jobs, and data transfers on Hadoop.

Scheduled Data Processing

Run recurring Hadoop batch jobs on time-based schedules using coordinator jobs.

Data-Triggered Workflows

Trigger workflows automatically when new data arrives in HDFS using coordinator data-in conditions.

Machine Learning Pipeline Automation

Automate ML model training and evaluation pipelines on Hadoop with dependency chaining.

Data Migration and Archival

Orchestrate large-scale data migration, compaction, and archival workflows across Hadoop clusters.

Multi-Cluster Coordination

Coordinate workflows that span multiple Hadoop clusters using Distcp and remote actions.

Integrations

Apache Hadoop

Core integration with HDFS for data storage and YARN for resource management.

Apache Hive

Native Hive action type for executing HiveQL queries as workflow steps.

Apache Pig

Native Pig action type for data transformation scripts in workflow pipelines.

Apache Sqoop

Native Sqoop action type for importing and exporting data between Hadoop and RDBMS.

Apache Spark

Spark action type for running Spark jobs within Oozie workflows.

Apache MapReduce

Native MapReduce action type as the foundational Hadoop computation framework.

Semantic Vocabularies

Apache Oozie Context

9 classes · 27 properties

JSON-LD

API Governance Rules

Apache Oozie API Rules

27 rules · 11 errors 13 warnings 3 info

SPECTRAL

Resources

👥
Apache Oozie GitHub Repository
GitHubRepository
👥
Apache Software Foundation GitHub
GitHubOrganization
🔗
Apache Oozie Documentation
Documentation
🚀
Oozie Quick Start Guide
GettingStarted
🎓
Oozie Examples
Tutorials
📄
Oozie Release Log
ReleaseNotes
📜
Apache License 2.0
TermsOfService
💬
Mailing Lists
Support
👥
Oozie on Stack Overflow
StackOverflow
🔗
Apache Oozie Spectral Rules
SpectralRules
🔗
Apache Oozie Workflow Orchestration
NaftikoCapability
🔗
Apache Oozie Vocabulary
Vocabulary
🔗
Apache Oozie JSON-LD Context
JSONLD