Apache Tez
Apache Tez is an application framework that allows for complex directed-acyclic-graph (DAG) based processing of data built on Apache Hadoop YARN. It is designed as a successor to MapReduce for executing Hive and Pig queries, providing a flexible API for creating DAG execution pipelines, in-memory data passing between tasks, and session reuse for reduced startup latency. Apache Tez is an Apache Software Foundation top-level project.
APIs
Apache Tez DAG API
The Tez DAG API provides a Java programming model for defining and submitting directed-acyclic-graph (DAG) computation jobs to Apache YARN. It allows building DAGs composed of V...
Apache Tez UI REST API
The Tez UI and YARN Application History Server expose REST endpoints for monitoring Tez application history, DAG details, vertex and task statistics. The Tez Timeline Server int...
Features
Flexible DAG computation model replacing MapReduce for complex multi-stage pipelines.
Direct in-memory data transfer between connected vertices eliminating HDFS I/O.
Tez sessions reuse container allocations across DAG submissions for reduced latency.
Runtime DAG modification based on actual data statistics during execution.
Native YARN resource management with fine-grained resource requests per vertex.
Use Cases
Tez is the default execution engine for Apache Hive queries replacing MapReduce.
Execute Apache Pig Latin scripts as optimized Tez DAGs.
Multi-stage data transformation pipelines with in-memory data passing.
Integrations
Native YARN resource manager integration for cluster resource allocation.
Default execution engine for Hive queries in HDP and CDH distributions.
Tez execution backend for Apache Pig script compilation and execution.
Input/output storage for Tez job data via Hadoop Distributed File System.