Apache DolphinScheduler

Apache DolphinScheduler is a modern distributed and extensible data orchestration platform governed by the Apache Software Foundation. It provides a DAG-based visual workflow designer, multi-master/multi-worker architecture for horizontal scaling, and a comprehensive REST API for programmatic control. It supports dozens of task types (Shell, Spark, Flink, SQL, Python, HTTP, etc.), multi-cloud deployments, multi-tenancy, backfill, and a Python SDK (PyDolphinScheduler).

1 APIs 8 Features

ApacheDAGData PipelineOpen SourceOrchestrationPythonSchedulingWorkflow

APIs

Apache DolphinScheduler REST API

The DolphinScheduler REST API enables programmatic management of projects, workflow definitions (DAGs), workflow instances, task types, schedules, resources, data sources, alert...

Features

DAG Visual Workflow Designer

Web-based drag-and-drop interface for building directed acyclic graph (DAG) workflows with real-time execution visualization.

REST Open API

Comprehensive REST API for all platform operations including workflow management, scheduling, resource management, and administration.

Multi-Master/Worker Architecture

Decentralized architecture with horizontal scaling support, capable of processing tens of millions of tasks per day.

Rich Task Types

Built-in task types including Shell, Spark, Flink, SQL, Python, HTTP, DataX, Seatunnel, Jupyter, and custom task plugins.

Multi-Tenancy

Supports multiple tenants with isolated resource quotas, permissions, and workflow namespaces.

Workflow Versioning

Version control for workflow definitions and instances, enabling rollback and auditing of workflow changes.

Data Source Management

Unified data source management supporting MySQL, PostgreSQL, Hive, Trino, Spark, ClickHouse, and many other databases.

Python SDK

PyDolphinScheduler allows defining and managing workflows programmatically in Python with code-first workflow authoring.

Use Cases

Data Pipeline Orchestration

Orchestrate complex ETL/ELT data pipelines with dependencies, retries, and monitoring across distributed systems.

Machine Learning Workflows

Schedule and manage ML model training, evaluation, and deployment pipelines with task dependencies.

Multi-Cloud Data Workflows

Orchestrate workflows spanning multiple cloud providers and data centers with unified scheduling.

SQL and Analytics Scheduling

Schedule recurring SQL queries, reports, and analytics jobs against multiple data sources.

DevOps and CI/CD Pipelines

Automate deployment workflows, data quality checks, and operational tasks with DolphinScheduler DAGs.

Integrations

Apache Spark

Native Spark task type for submitting Spark batch and streaming jobs from DolphinScheduler workflows.

Apache Flink

Native Flink task type for submitting Flink stream processing jobs.

Apache Hive

Hive data source and task type for SQL-on-Hadoop workloads.

Kubernetes

Kubernetes deployment mode and K8s task type for container-native workflow execution.

Docker

Official Docker images and Docker Compose configuration for rapid deployment.

DataX / SeaTunnel

Native task types for DataX and SeaTunnel data integration frameworks.

Apache Airflow

An Airflow provider package allows triggering DolphinScheduler workflows from Airflow DAGs.

GitHubOrganization

PyDolphinScheduler Python SDK

Apache DolphinScheduler

APIs

Apache DolphinScheduler REST API

Features

Use Cases

Integrations

Semantic Vocabularies

Apache Dolphinscheduler Context

Resources