Apache Atlas logo

Apache Atlas

Apache Atlas is a scalable and extensible set of core foundational data governance services developed by the Apache Software Foundation. It enables enterprises to effectively meet their compliance requirements within Hadoop and allows integration with the whole enterprise data ecosystem. Atlas provides metadata management, data classification, lineage tracking, business glossary, and a REST API for programmatic governance operations. It supports discovery, auditing, and policy management for enterprise data assets.

1 APIs 1 Capabilities 10 Features
ApacheBig DataComplianceData GovernanceData LineageHadoopMetadataOpen Source

APIs

Apache Atlas REST API

The Atlas REST API provides endpoints for managing types, entities, lineage, discovery, and glossary resources, enabling programmatic metadata management and data governance ope...

Capabilities

Apache Atlas Data Governance

Unified capability for Apache Atlas data governance workflows including metadata discovery, lineage tracking, entity management, and glossary management. Used by data governance...

Run with Naftiko

Features

Metadata Management

Centrally manage metadata for enterprise data assets including Hive tables, HDFS files, Kafka topics, HBase tables, and Spark jobs.

Data Classification

Apply classification tags to data assets for sensitivity classification (PII, PHI, confidential) and policy enforcement.

Data Lineage Tracking

Automatically capture and visualize data lineage across data pipeline stages for impact analysis and compliance.

Business Glossary

Manage a centralized business glossary of terms and categories to standardize data definitions across the organization.

REST API

Comprehensive REST API for programmatic metadata management, discovery, lineage retrieval, and type definition management.

Search and Discovery

Find data assets using basic, full-text, DSL, and attribute-based search across all registered metadata.

Policy-Based Data Access

Integrate with Apache Ranger for attribute-based access control policies driven by Atlas classification tags.

Auditing

Comprehensive audit trail of all metadata changes and entity operations for compliance and governance.

Hook-Based Metadata Collection

Hooks for Hive, HBase, Sqoop, Storm, and other Hadoop ecosystem tools for automatic metadata harvesting.

Type System

Extensible type system for defining custom entity types, classification types, and relationship types.

Use Cases

Data Governance and Compliance

Track data assets, apply classifications, and enforce policies for GDPR, HIPAA, and CCPA compliance.

Data Lineage Analysis

Trace data from source to consumption to understand pipeline impact and debug data quality issues.

Metadata-Driven Data Discovery

Enable data consumers to find relevant datasets using classification-based and attribute-based search.

Data Catalog Integration

Serve as the metadata backbone for enterprise data catalogs and data mesh architectures.

Sensitive Data Identification

Classify PII and sensitive data assets and integrate with Ranger for attribute-based access control.

Business Glossary Management

Maintain standard business definitions and link them to technical metadata for consistent data interpretation.

Integrations

Apache Hive

Native Hive hook for automatic metadata harvesting of Hive databases, tables, and query lineage.

Apache Ranger

Integration with Ranger for policy-based data access control driven by Atlas classification tags.

Apache Kafka

Kafka hook for tracking Kafka topics and message schema metadata.

Apache HBase

HBase hook for capturing table and namespace metadata.

Apache Spark

Spark integration for capturing dataset and job-level lineage from Spark applications.

Apache Sqoop

Sqoop hook for importing relational database metadata and lineage into Atlas.

Cloudera Data Platform

Native integration with Cloudera Data Platform (CDP) as the metadata management backbone.

Semantic Vocabularies

Apache Atlas Context

11 classes · 34 properties

JSON-LD

API Governance Rules

Apache Atlas API Rules

12 rules · 7 errors 5 warnings

SPECTRAL

Resources

👥
GitHubOrganization
GitHubOrganization
👥
GitHubRepository
GitHubRepository
🔗
Documentation
Documentation
🚀
GettingStarted
GettingStarted
💬
Support
Support
📜
TermsOfService
TermsOfService
📄
ChangeLog
ChangeLog
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
NaftikoCapability
NaftikoCapability