Amazon Glue logo

Amazon Glue

Amazon Glue is a serverless data integration service that makes it simple to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning, and application development. It provides both visual and code-based interfaces for ETL operations and includes a Data Catalog for unified metadata management.

1 APIs 1 Capabilities 8 Features
AnalyticsData CatalogData IntegrationData PipelineETLServerless

APIs

Amazon Glue API

The Amazon Glue API enables programmatic access to create and manage ETL jobs, crawlers, data catalogs, connections, and development endpoints. You can discover data sources, tr...

Capabilities

Amazon Glue Data Integration

Workflow capability for data engineers building ETL pipelines with Amazon Glue. Covers job management, crawler configuration, data catalog operations, workflow orchestration, an...

Run with Naftiko

Features

Serverless ETL

Run ETL jobs without managing infrastructure with automatic scaling and pay-per-use pricing.

Visual ETL Editor

Build ETL pipelines visually using a drag-and-drop interface without writing code.

Data Catalog

Unified metadata repository for all data assets across S3, databases, and data warehouses.

Automated Schema Discovery

Crawlers automatically discover data schemas and populate the Data Catalog.

Workflow Orchestration

Orchestrate multi-job ETL pipelines with triggers, conditional flows, and scheduling.

ML Transforms

Use machine learning to automate complex data transformation tasks like entity deduplication.

Schema Registry

Centrally manage and enforce data schema evolution with versioning and compatibility checks.

Data Quality

Define and evaluate data quality rules to validate data during ETL processing.

Use Cases

Data Lake ETL

Build ETL pipelines to ingest, transform, and load data into Amazon S3 data lakes.

Data Warehouse Loading

Extract and transform data from multiple sources and load into Amazon Redshift.

Data Catalog Management

Maintain a unified data catalog for data discovery across all data assets.

Real-Time Streaming ETL

Process streaming data from Kinesis and Kafka with Glue Streaming jobs.

Machine Learning Data Prep

Prepare and transform training datasets for machine learning using Glue Studio.

Integrations

Amazon S3

Primary data lake storage for Glue ETL input and output.

Amazon Redshift

Load transformed data into Redshift data warehouse.

Amazon Athena

Query Data Catalog tables directly with Athena serverless SQL.

Amazon Kinesis

Process streaming data from Kinesis Data Streams with Glue streaming.

Apache Kafka

Ingest and process Kafka streaming data in Glue jobs.

AWS Lake Formation

Fine-grained access control to Glue Data Catalog resources.

Amazon RDS

Connect to relational databases as ETL data sources.

Semantic Vocabularies

Amazon Glue Context

418 classes · 330 properties

JSON-LD

API Governance Rules

Amazon Glue API Rules

8 rules · 5 errors 2 warnings 1 info

SPECTRAL

Resources

🌐
Portal
Portal
🔗
Documentation
Documentation
📜
TermsOfService
TermsOfService
📜
PrivacyPolicy
PrivacyPolicy
💬
Support
Support
📰
Blog
Blog
👥
GitHubOrganization
GitHubOrganization
🌐
Console
Console
📝
SignUp
SignUp
🟢
StatusPage
StatusPage
🔗
Contact
Contact
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
NaftikoCapability
NaftikoCapability

Sources

Raw ↑
aid: amazon-glue
name: Amazon Glue
description: >-
  Amazon Glue is a serverless data integration service that makes it simple
  to discover, prepare, move, and integrate data from multiple sources for
  analytics, machine learning, and application development. It provides both
  visual and code-based interfaces for ETL operations and includes a Data Catalog
  for unified metadata management.
type: Index
image: https://a0.awsstatic.com/libra-css/images/logos/aws_logo_smile_1200x630.png
url: https://raw.githubusercontent.com/api-evangelist/amazon-glue/refs/heads/main/apis.yml
type: Index
created: '2024-01-15'
modified: '2026-04-19'
specificationVersion: '0.19'
tags:
  - Analytics
  - AWS
  - Data Catalog
  - Data Integration
  - Data Pipeline
  - ETL
  - Serverless
apis:
  - aid: amazon-glue:amazon-glue-api
    name: Amazon Glue API
    description: >-
      The Amazon Glue API enables programmatic access to create and manage ETL
      jobs, crawlers, data catalogs, connections, and development endpoints.
      You can discover data sources, transform data, and orchestrate data
      integration workflows across multiple data stores.
    humanURL: https://aws.amazon.com/glue/
    baseURL: https://glue.amazonaws.com
    tags:
      - Analytics
      - Data Catalog
      - Data Integration
      - ETL
    properties:
      - type: Documentation
        url: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api.html
      - type: OpenAPI
        url: openapi/amazon-glue-openapi.yml
      - type: GettingStarted
        url: https://aws.amazon.com/glue/getting-started/
      - type: Pricing
        url: https://aws.amazon.com/glue/pricing/
      - type: FAQ
        url: https://aws.amazon.com/glue/faqs/
      - type: APIReference
        url: https://docs.aws.amazon.com/glue/latest/dg/aws-glue-api.html
      - type: Authentication
        url: https://docs.aws.amazon.com/general/latest/gr/signature-version-4.html
      - type: JSONSchema
        url: json-schema/glue-job-schema.json
      - type: JSONLD
        url: json-ld/amazon-glue-context.jsonld
common:
  - type: Portal
    url: https://aws.amazon.com/glue/
  - type: Documentation
    url: https://docs.aws.amazon.com/glue/
  - type: TermsOfService
    url: https://aws.amazon.com/service-terms/
  - type: PrivacyPolicy
    url: https://aws.amazon.com/privacy/
  - type: Support
    url: https://aws.amazon.com/premiumsupport/
  - type: Blog
    url: https://aws.amazon.com/blogs/big-data/tag/aws-glue/
  - type: GitHubOrganization
    url: https://github.com/aws
  - type: Console
    url: https://console.aws.amazon.com/glue/
  - type: SignUp
    url: https://portal.aws.amazon.com/billing/signup
  - type: StatusPage
    url: https://health.aws.amazon.com/health/status
  - type: Contact
    url: https://aws.amazon.com/contact-us/
  - type: SpectralRules
    url: rules/amazon-glue-spectral-rules.yml
  - type: Vocabulary
    url: vocabulary/amazon-glue-vocabulary.yaml
  - type: NaftikoCapability
    url: capabilities/amazon-glue-data-integration.yaml
  - type: Features
    data:
      - name: Serverless ETL
        description: Run ETL jobs without managing infrastructure with automatic scaling and pay-per-use pricing.
      - name: Visual ETL Editor
        description: Build ETL pipelines visually using a drag-and-drop interface without writing code.
      - name: Data Catalog
        description: Unified metadata repository for all data assets across S3, databases, and data warehouses.
      - name: Automated Schema Discovery
        description: Crawlers automatically discover data schemas and populate the Data Catalog.
      - name: Workflow Orchestration
        description: Orchestrate multi-job ETL pipelines with triggers, conditional flows, and scheduling.
      - name: ML Transforms
        description: Use machine learning to automate complex data transformation tasks like entity deduplication.
      - name: Schema Registry
        description: Centrally manage and enforce data schema evolution with versioning and compatibility checks.
      - name: Data Quality
        description: Define and evaluate data quality rules to validate data during ETL processing.
  - type: UseCases
    data:
      - name: Data Lake ETL
        description: Build ETL pipelines to ingest, transform, and load data into Amazon S3 data lakes.
      - name: Data Warehouse Loading
        description: Extract and transform data from multiple sources and load into Amazon Redshift.
      - name: Data Catalog Management
        description: Maintain a unified data catalog for data discovery across all data assets.
      - name: Real-Time Streaming ETL
        description: Process streaming data from Kinesis and Kafka with Glue Streaming jobs.
      - name: Machine Learning Data Prep
        description: Prepare and transform training datasets for machine learning using Glue Studio.
  - type: Integrations
    data:
      - name: Amazon S3
        description: Primary data lake storage for Glue ETL input and output.
      - name: Amazon Redshift
        description: Load transformed data into Redshift data warehouse.
      - name: Amazon Athena
        description: Query Data Catalog tables directly with Athena serverless SQL.
      - name: Amazon Kinesis
        description: Process streaming data from Kinesis Data Streams with Glue streaming.
      - name: Apache Kafka
        description: Ingest and process Kafka streaming data in Glue jobs.
      - name: AWS Lake Formation
        description: Fine-grained access control to Glue Data Catalog resources.
      - name: Amazon RDS
        description: Connect to relational databases as ETL data sources.
maintainers:
  - FN: Kin Lane
    email: [email protected]
    url: https://apievangelist.com