Apache Tez logo

Apache Tez

Apache Tez is an application framework that allows for complex directed-acyclic-graph (DAG) based processing of data built on Apache Hadoop YARN. It is designed as a successor to MapReduce for executing Hive and Pig queries, providing a flexible API for creating DAG execution pipelines, in-memory data passing between tasks, and session reuse for reduced startup latency. Apache Tez is an Apache Software Foundation top-level project.

2 APIs 5 Features
Big DataDAGExecution EngineHadoopYARNOpen Source

APIs

Apache Tez DAG API

The Tez DAG API provides a Java programming model for defining and submitting directed-acyclic-graph (DAG) computation jobs to Apache YARN. It allows building DAGs composed of V...

Apache Tez UI REST API

The Tez UI and YARN Application History Server expose REST endpoints for monitoring Tez application history, DAG details, vertex and task statistics. The Tez Timeline Server int...

Features

DAG-Based Execution

Flexible DAG computation model replacing MapReduce for complex multi-stage pipelines.

In-Memory Data Passing

Direct in-memory data transfer between connected vertices eliminating HDFS I/O.

Session Reuse

Tez sessions reuse container allocations across DAG submissions for reduced latency.

Dynamic Optimization

Runtime DAG modification based on actual data statistics during execution.

YARN Integration

Native YARN resource management with fine-grained resource requests per vertex.

Use Cases

Apache Hive Query Execution

Tez is the default execution engine for Apache Hive queries replacing MapReduce.

Apache Pig Script Execution

Execute Apache Pig Latin scripts as optimized Tez DAGs.

Complex ETL Pipelines

Multi-stage data transformation pipelines with in-memory data passing.

Integrations

Apache Hadoop YARN

Native YARN resource manager integration for cluster resource allocation.

Apache Hive

Default execution engine for Hive queries in HDP and CDH distributions.

Apache Pig

Tez execution backend for Apache Pig script compilation and execution.

Apache HDFS

Input/output storage for Tez job data via Hadoop Distributed File System.

Resources

👥
GitHubRepository
GitHubRepository
🔗
Documentation
Documentation
🌐
Portal
Portal
📄
ReleaseNotes
ReleaseNotes
📜
TermsOfService
TermsOfService