Apache DolphinScheduler
Apache DolphinScheduler is a modern distributed and extensible data orchestration platform governed by the Apache Software Foundation. It provides a DAG-based visual workflow designer, multi-master/multi-worker architecture for horizontal scaling, and a comprehensive REST API for programmatic control. It supports dozens of task types (Shell, Spark, Flink, SQL, Python, HTTP, etc.), multi-cloud deployments, multi-tenancy, backfill, and a Python SDK (PyDolphinScheduler).
APIs
Apache DolphinScheduler REST API
The DolphinScheduler REST API enables programmatic management of projects, workflow definitions (DAGs), workflow instances, task types, schedules, resources, data sources, alert...
Features
Web-based drag-and-drop interface for building directed acyclic graph (DAG) workflows with real-time execution visualization.
Comprehensive REST API for all platform operations including workflow management, scheduling, resource management, and administration.
Decentralized architecture with horizontal scaling support, capable of processing tens of millions of tasks per day.
Built-in task types including Shell, Spark, Flink, SQL, Python, HTTP, DataX, Seatunnel, Jupyter, and custom task plugins.
Supports multiple tenants with isolated resource quotas, permissions, and workflow namespaces.
Version control for workflow definitions and instances, enabling rollback and auditing of workflow changes.
Unified data source management supporting MySQL, PostgreSQL, Hive, Trino, Spark, ClickHouse, and many other databases.
PyDolphinScheduler allows defining and managing workflows programmatically in Python with code-first workflow authoring.
Use Cases
Orchestrate complex ETL/ELT data pipelines with dependencies, retries, and monitoring across distributed systems.
Schedule and manage ML model training, evaluation, and deployment pipelines with task dependencies.
Orchestrate workflows spanning multiple cloud providers and data centers with unified scheduling.
Schedule recurring SQL queries, reports, and analytics jobs against multiple data sources.
Automate deployment workflows, data quality checks, and operational tasks with DolphinScheduler DAGs.
Integrations
Native Spark task type for submitting Spark batch and streaming jobs from DolphinScheduler workflows.
Native Flink task type for submitting Flink stream processing jobs.
Hive data source and task type for SQL-on-Hadoop workloads.
Kubernetes deployment mode and K8s task type for container-native workflow execution.
Official Docker images and Docker Compose configuration for rapid deployment.
Native task types for DataX and SeaTunnel data integration frameworks.
An Airflow provider package allows triggering DolphinScheduler workflows from Airflow DAGs.