Amazon Data Pipeline logo

Amazon Data Pipeline

AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services, as well as on-premises data sources, at specified intervals. With AWS Data Pipeline, you can regularly access your data where it is stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR. It supports data-driven workflows with retry, failure handling, and scheduling capabilities.

1 APIs 7 Features
Data ProcessingETLWorkflowsData PipelineAutomation

APIs

AWS Data Pipeline API

The AWS Data Pipeline API provides a web service for processing and moving data between different AWS compute and storage services as well as on-premises data sources at specifi...

Features

Data-Driven Workflows

Define complex data processing workflows with activities, data nodes, schedules, and preconditions using a declarative pipeline definition.

Multi-Service Integration

Move and transform data between Amazon S3, Amazon RDS, Amazon DynamoDB, Amazon Redshift, and Amazon EMR in a single pipeline.

Flexible Scheduling

Schedule pipeline runs at fixed intervals (hourly, daily, weekly) or trigger them based on data availability preconditions.

Automated Retry and Failure Handling

Configure automatic retries for failed activities with configurable retry intervals, timeout settings, and failure notifications.

On-Premises Data Support

Process data from on-premises databases and file systems using the Data Pipeline Task Runner agent installed locally.

EMR Integration

Launch and manage Amazon EMR clusters as pipeline resources to run Hive, Pig, and MapReduce jobs as part of data workflows.

Pipeline Versioning

Manage active and latest pipeline definition versions, enabling updates to running pipelines without disrupting current execution.

Use Cases

Daily ETL Workflows

Schedule daily extraction, transformation, and loading of data from relational databases into S3 or Redshift for analytics processing.

Log Processing Pipelines

Process application and server log files from S3 using EMR activities to generate aggregated reports and analytics datasets.

Database Migration

Migrate data between on-premises databases and AWS managed database services using scheduled pipeline activities.

Data Lake Ingestion

Automate the ingestion and transformation of raw data into structured formats in S3 data lakes for downstream analytics.

Cross-Region Data Replication

Replicate DynamoDB tables or S3 data across AWS regions using scheduled pipeline copy activities for disaster recovery.

Semantic Vocabularies

Amazon Data Pipeline Context

0 classes · 30 properties

JSON-LD

API Governance Rules

Amazon Data Pipeline API Rules

22 rules · 13 errors 5 warnings 4 info

SPECTRAL

Resources

🔗
PostmanWorkspace
PostmanWorkspace
🔗
ArazzoWorkflows
ArazzoWorkflows
🌐
Portal
Portal
🌐
DeveloperPortal
DeveloperPortal
🔗
Documentation
Documentation
📜
TermsOfService
TermsOfService
📜
PrivacyPolicy
PrivacyPolicy
💬
Support
Support
👥
GitHubOrganization
GitHubOrganization
🌐
Console
Console
📝
SignUp
SignUp
🔗
Login
Login
🟢
StatusPage
StatusPage
🔗
Contact
Contact
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary

Sources

Raw ↑
aid: amazon-data-pipeline
name: Amazon Data Pipeline
description: >-
  AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and
  storage services, as well as on-premises data sources, at specified intervals. With AWS Data Pipeline, you can
  regularly access your data where it is stored, transform and process it at scale, and efficiently transfer the results
  to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR. It supports data-driven workflows with
  retry, failure handling, and scheduling capabilities.
type: Index
image: https://kinlane-productions.s3.amazonaws.com/apis-json/apis-json-logo.jpg
tags:
  - AWS
  - Data Processing
  - ETL
  - Workflows
  - Data Pipeline
  - Automation
url: https://raw.githubusercontent.com/api-evangelist/amazon-data-pipeline/refs/heads/main/apis.yml
created: '2024-01-15'
modified: '2026-05-19'
specificationVersion: '0.19'
apis:
  - aid: amazon-data-pipeline:aws-data-pipeline-api
    name: AWS Data Pipeline API
    description: >-
      The AWS Data Pipeline API provides a web service for processing and moving data between different AWS compute and
      storage services as well as on-premises data sources at specified intervals. The API allows you to create pipeline
      definitions, schedule data transformations, configure retry and failure handling logic, and monitor pipeline
      execution across Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR.
    humanURL: https://aws.amazon.com/datapipeline/
    baseURL: https://datapipeline.amazonaws.com
    tags:
      - AWS
      - Data Processing
      - ETL
      - Workflows
    properties:
      - type: Documentation
        url: https://docs.aws.amazon.com/datapipeline/
      - type: OpenAPI
        url: openapi/amazon-data-pipeline-openapi.yml
      - type: Pricing
        url: https://aws.amazon.com/datapipeline/pricing/
      - type: GettingStarted
        url: https://aws.amazon.com/datapipeline/getting-started/
      - type: FAQ
        url: https://aws.amazon.com/datapipeline/faqs/
      - type: APIReference
        url: https://docs.aws.amazon.com/datapipeline/latest/APIReference/
      - type: JSONSchema
        url: json-schema/pipeline-object-schema.json
      - type: JSONSchema
        url: json-schema/pipeline-description-schema.json
      - type: JSONLD
        url: json-ld/amazon-data-pipeline-context.jsonld
common:
  - type: PostmanWorkspace
    url: https://www.postman.com/kinlaneapi/amazon-data-pipeline/overview
  - type: ArazzoWorkflows
    url: arazzo/
    workflows:
      - url: arazzo/amazon-data-pipeline-clone-pipeline-workflow.yml
        name: Amazon Data Pipeline Clone Pipeline
        summary: Copy an existing pipeline's definition into a brand-new pipeline and activate it.
      - url: arazzo/amazon-data-pipeline-deactivate-and-delete-workflow.yml
        name: Amazon Data Pipeline Deactivate and Delete
        summary: Stop a running pipeline and then permanently remove it and its run history.
      - url: arazzo/amazon-data-pipeline-export-definition-workflow.yml
        name: Amazon Data Pipeline Export Definition
        summary: Confirm a pipeline exists and then export its active definition objects.
      - url: arazzo/amazon-data-pipeline-inspect-running-tasks-workflow.yml
        name: Amazon Data Pipeline Inspect Running Tasks
        summary: Find running task instances in a pipeline and pull their full object definitions.
      - url: arazzo/amazon-data-pipeline-list-and-describe-workflow.yml
        name: Amazon Data Pipeline List and Describe
        summary: List all accessible pipelines and pull full metadata for the first page of them.
      - url: arazzo/amazon-data-pipeline-provision-and-activate-workflow.yml
        name: Amazon Data Pipeline Provision and Activate
        summary: Create an empty pipeline, populate its definition, activate it, and confirm its state.
      - url: arazzo/amazon-data-pipeline-redeploy-definition-workflow.yml
        name: Amazon Data Pipeline Redeploy Definition
        summary: Deactivate a pipeline, write a new definition, then reactivate it with the new objects.
      - url: arazzo/amazon-data-pipeline-tag-and-confirm-workflow.yml
        name: Amazon Data Pipeline Tag and Confirm
        summary: Add governance tags to a pipeline and confirm they are attached.
      - url: arazzo/amazon-data-pipeline-validate-then-put-definition-workflow.yml
        name: Amazon Data Pipeline Validate Then Put Definition
        summary: Validate a candidate pipeline definition and only commit it when it is error free.
  - type: Portal
    url: https://aws.amazon.com/datapipeline/
  - type: DeveloperPortal
    url: https://aws.amazon.com/datapipeline/
  - type: Documentation
    url: https://docs.aws.amazon.com/datapipeline/
  - type: TermsOfService
    url: https://aws.amazon.com/service-terms/
  - type: PrivacyPolicy
    url: https://aws.amazon.com/privacy/
  - type: Support
    url: https://aws.amazon.com/premiumsupport/
  - type: GitHubOrganization
    url: https://github.com/aws
  - type: Console
    url: https://console.aws.amazon.com/datapipeline/
  - type: SignUp
    url: https://portal.aws.amazon.com/billing/signup
  - type: Login
    url: https://signin.aws.amazon.com/
  - type: StatusPage
    url: https://health.aws.amazon.com/health/status
  - type: Contact
    url: https://aws.amazon.com/contact-us/
  - type: SpectralRules
    url: rules/amazon-data-pipeline-spectral-rules.yml
  - type: Vocabulary
    url: vocabulary/amazon-data-pipeline-vocabulary.yaml
  - type: Features
    data:
      - name: Data-Driven Workflows
        description: >-
          Define complex data processing workflows with activities, data nodes, schedules, and preconditions using a
          declarative pipeline definition.
      - name: Multi-Service Integration
        description: >-
          Move and transform data between Amazon S3, Amazon RDS, Amazon DynamoDB, Amazon Redshift, and Amazon EMR in a
          single pipeline.
      - name: Flexible Scheduling
        description: >-
          Schedule pipeline runs at fixed intervals (hourly, daily, weekly) or trigger them based on data availability
          preconditions.
      - name: Automated Retry and Failure Handling
        description: >-
          Configure automatic retries for failed activities with configurable retry intervals, timeout settings, and
          failure notifications.
      - name: On-Premises Data Support
        description: >-
          Process data from on-premises databases and file systems using the Data Pipeline Task Runner agent installed
          locally.
      - name: EMR Integration
        description: >-
          Launch and manage Amazon EMR clusters as pipeline resources to run Hive, Pig, and MapReduce jobs as part of
          data workflows.
      - name: Pipeline Versioning
        description: >-
          Manage active and latest pipeline definition versions, enabling updates to running pipelines without
          disrupting current execution.
  - type: UseCases
    data:
      - name: Daily ETL Workflows
        description: >-
          Schedule daily extraction, transformation, and loading of data from relational databases into S3 or Redshift
          for analytics processing.
      - name: Log Processing Pipelines
        description: >-
          Process application and server log files from S3 using EMR activities to generate aggregated reports and
          analytics datasets.
      - name: Database Migration
        description: >-
          Migrate data between on-premises databases and AWS managed database services using scheduled pipeline
          activities.
      - name: Data Lake Ingestion
        description: >-
          Automate the ingestion and transformation of raw data into structured formats in S3 data lakes for downstream
          analytics.
      - name: Cross-Region Data Replication
        description: >-
          Replicate DynamoDB tables or S3 data across AWS regions using scheduled pipeline copy activities for disaster
          recovery.
  - type: Integrations
    data:
      - name: Amazon S3
        description: >-
          Primary data node type for reading input data and writing output data in pipeline ETL activities using
          S3DataNode.
      - name: Amazon EMR
        description: >-
          Managed Hadoop/Spark cluster resource for running large-scale data processing activities including Hive, Pig,
          and MapReduce jobs.
      - name: Amazon RDS
        description: >-
          Relational database data node for SQL-based data extraction and loading between RDS instances and S3 or
          Redshift.
      - name: Amazon DynamoDB
        description: >-
          NoSQL data node for importing and exporting DynamoDB table data in pipeline activities for batch processing
          workflows.
      - name: Amazon Redshift
        description: >-
          Data warehouse target for loading processed pipeline output data for business intelligence and analytics
          queries.
      - name: AWS Glue
        description: >-
          Modern alternative managed ETL service that can complement or replace Data Pipeline for serverless data
          transformation workflows.
      - name: Amazon CloudWatch
        description: Monitor pipeline execution status, set up alarms for pipeline failures, and track activity completion metrics.
  - type: Integrations
    url: https://aws.amazon.com/marketplace
integrations:
  - name: Sign in
  - name: Agent Mode
  - name: Why AWS Marketplace?
  - name: Get started in AWS Marketplace
  - name: Industry
  - name: Resources
  - name: Become a Channel Partner
  - name: Sell in AWS Marketplace
  - name: Manage Your Account
maintainers:
  - FN: Kin Lane
    email: [email protected]
    url: https://apievangelist.com