Apache Airflow logo

Apache Airflow

Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows, developed by the Apache Software Foundation. It allows you to define workflows as Directed Acyclic Graphs (DAGs) in Python code, making them maintainable, versionable, testable, and collaborative. Airflow provides a stable REST API for managing DAGs, DAG runs, tasks, connections, variables, pools, and users, along with a web-based UI for monitoring and managing pipeline execution.

2 APIs 1 Capabilities 10 Features
ApacheDAGData PipelineETLOpen SourceOrchestrationPythonSchedulingWorkflow

APIs

Apache Airflow REST API

The stable public REST API for interacting with Apache Airflow programmatically, allowing management of DAGs, DAG runs, task instances, connections, variables, pools, roles, use...

Apache Airflow Experimental API (Deprecated)

The experimental API that preceded the stable REST API. This is deprecated and should not be used for new implementations.

Capabilities

Apache Airflow Workflow Orchestration

Unified capability for managing and monitoring Apache Airflow DAGs, runs, tasks, connections, and variables. Used by data engineers and platform operators to orchestrate data pi...

Run with Naftiko

Features

DAG-as-Code

Define workflows as Python code (Directed Acyclic Graphs) for version control, testing, and collaboration.

Stable REST API

Full-featured REST API for programmatic management of DAGs, runs, tasks, connections, variables, pools, and users.

Dynamic Pipeline Generation

Generate DAGs dynamically using Python, supporting complex conditional and parametric pipelines.

Extensible Providers

Rich ecosystem of provider packages for integrating with AWS, GCP, Azure, databases, and hundreds of external services.

Rich Web UI

Browser-based dashboard for monitoring DAG runs, task statuses, logs, and Gantt charts.

Resource Pools

Control concurrency and resource allocation across tasks using configurable pools.

Cross-DAG Dependencies

Define dependencies between DAGs using sensors, dataset-driven scheduling, and external task sensors.

Pluggable Executors

Supports Sequential, Local, Celery, Kubernetes, and DASK executors for flexible deployment.

SLA Monitoring

Define and track Service Level Agreements on task and DAG completion times.

Variable and Connection Management

Centrally manage environment-specific configuration via Airflow variables and connections.

Use Cases

ETL Pipeline Orchestration

Schedule and manage extract, transform, load pipelines with dependency management and retry logic.

Machine Learning Workflows

Orchestrate ML training, validation, and deployment pipelines with data dependency tracking.

Data Warehouse Loading

Coordinate data ingestion from multiple sources into data warehouses like BigQuery, Redshift, and Snowflake.

Batch Report Generation

Schedule periodic batch reporting jobs with email notification on completion or failure.

Multi-Cloud Data Movement

Move data between AWS, GCP, and Azure using provider integrations with dependency control.

CI/CD Pipeline Orchestration

Trigger and monitor software deployment pipelines with upstream/downstream task dependencies.

Integrations

Apache Spark

Native Spark submit and Livy operator integration for distributed data processing.

Google Cloud

Comprehensive GCP provider for BigQuery, Cloud Storage, Dataflow, Dataproc, and more.

Amazon Web Services

AWS provider for S3, Redshift, EMR, Glue, Lambda, and other services.

Microsoft Azure

Azure provider for Blob Storage, Data Factory, HDInsight, and Databricks.

dbt

dbt operator for running dbt transformations within Airflow pipelines.

Kubernetes

KubernetesPodOperator for running tasks in isolated Kubernetes pods.

Docker

DockerOperator for running tasks in Docker containers with isolated environments.

Apache Kafka

Kafka producers and consumers as Airflow tasks via the Kafka provider.

Semantic Vocabularies

Apache Airflow Context

88 classes · 197 properties

JSON-LD

API Governance Rules

Apache Airflow API Rules

19 rules · 8 errors 9 warnings 2 info

SPECTRAL

Resources

👥
GitHubOrganization
GitHubOrganization
👥
GitHubRepository
GitHubRepository
🔗
Documentation
Documentation
🚀
GettingStarted
GettingStarted
🎓
Tutorials
Tutorials
📦
Python Package (PyPI)
SDK
📦
Docker Image
SDK
🔗
Security
Security
📰
Blog
Blog
💬
Support
Support
📄
ChangeLog
ChangeLog
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
NaftikoCapability
NaftikoCapability