Apache Pig logo

Apache Pig

Apache Pig is a platform for analyzing large data sets that provides a high-level language (Pig Latin) for expressing data analysis programs. It compiles Pig Latin programs into MapReduce/Tez jobs and runs them on Hadoop clusters.

1 APIs 1 Capabilities 6 Features
Big DataData AnalysisETLHadoopScriptingApacheOpen Source

APIs

Apache Pig

Pig provides the Pig Latin scripting language for data analysis, an embedded Pig API for programmatic execution from Java, and a UDF (User Defined Function) API for custom data ...

Capabilities

Apache Pig Data Processing Workflow

Workflow for submitting, monitoring, and managing Pig Latin data analysis jobs on Hadoop clusters.

Run with Naftiko

Features

Pig Latin Language

High-level dataflow language for expressing data transformations

MapReduce/Tez Backend

Compiles Pig Latin to MapReduce or Apache Tez execution plans

UDF Support

User-defined functions in Java, Python, JavaScript, and Ruby

Streaming

Process data through external programs using STREAM operator

Schema Evolution

Flexible schema handling for semi-structured data

Optimization

Automatic logical and physical plan optimization

Use Cases

ETL Pipelines

Build data transformation pipelines from raw logs to structured data

Ad-hoc Data Analysis

Analyze large datasets with ad-hoc Pig Latin queries

Data Preparation

Clean and prepare data for machine learning workflows

Log Processing

Process and aggregate web server and application logs

Integrations

Apache Hadoop

Native MapReduce execution on YARN/HDFS

Apache Tez

High-performance Tez execution engine support

Apache HBase

HBase storage handler for reading/writing HBase tables

Apache Hive

HCatalog integration for Hive metastore access

Amazon S3

S3 input/output for cloud-based data processing

Semantic Vocabularies

Apache Pig Context

7 classes · 19 properties

JSON-LD

API Governance Rules

Apache Pig API Rules

5 rules · 4 errors 1 info

SPECTRAL

Resources

👥
GitHubOrganization
GitHubOrganization
🔗
Documentation
Documentation
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
NaftikoCapability
NaftikoCapability
🔗
JSON-LD
JSON-LD