Apache Airflow
Apache Airflow is an open-source platform to programmatically author, schedule, and monitor workflows, developed by the Apache Software Foundation. It allows you to define workflows as Directed Acyclic Graphs (DAGs) in Python code, making them maintainable, versionable, testable, and collaborative. Airflow provides a stable REST API for managing DAGs, DAG runs, tasks, connections, variables, pools, and users, along with a web-based UI for monitoring and managing pipeline execution.
APIs
Apache Airflow REST API
The stable public REST API for interacting with Apache Airflow programmatically, allowing management of DAGs, DAG runs, task instances, connections, variables, pools, roles, use...
Apache Airflow Experimental API (Deprecated)
The experimental API that preceded the stable REST API. This is deprecated and should not be used for new implementations.
Capabilities
Apache Airflow Workflow Orchestration
Unified capability for managing and monitoring Apache Airflow DAGs, runs, tasks, connections, and variables. Used by data engineers and platform operators to orchestrate data pi...
Run with NaftikoFeatures
Define workflows as Python code (Directed Acyclic Graphs) for version control, testing, and collaboration.
Full-featured REST API for programmatic management of DAGs, runs, tasks, connections, variables, pools, and users.
Generate DAGs dynamically using Python, supporting complex conditional and parametric pipelines.
Rich ecosystem of provider packages for integrating with AWS, GCP, Azure, databases, and hundreds of external services.
Browser-based dashboard for monitoring DAG runs, task statuses, logs, and Gantt charts.
Control concurrency and resource allocation across tasks using configurable pools.
Define dependencies between DAGs using sensors, dataset-driven scheduling, and external task sensors.
Supports Sequential, Local, Celery, Kubernetes, and DASK executors for flexible deployment.
Define and track Service Level Agreements on task and DAG completion times.
Centrally manage environment-specific configuration via Airflow variables and connections.
Use Cases
Schedule and manage extract, transform, load pipelines with dependency management and retry logic.
Orchestrate ML training, validation, and deployment pipelines with data dependency tracking.
Coordinate data ingestion from multiple sources into data warehouses like BigQuery, Redshift, and Snowflake.
Schedule periodic batch reporting jobs with email notification on completion or failure.
Move data between AWS, GCP, and Azure using provider integrations with dependency control.
Trigger and monitor software deployment pipelines with upstream/downstream task dependencies.
Integrations
Native Spark submit and Livy operator integration for distributed data processing.
Comprehensive GCP provider for BigQuery, Cloud Storage, Dataflow, Dataproc, and more.
AWS provider for S3, Redshift, EMR, Glue, Lambda, and other services.
Azure provider for Blob Storage, Data Factory, HDInsight, and Databricks.
dbt operator for running dbt transformations within Airflow pipelines.
KubernetesPodOperator for running tasks in isolated Kubernetes pods.
DockerOperator for running tasks in Docker containers with isolated environments.
Kafka producers and consumers as Airflow tasks via the Kafka provider.