Apache Pig
Apache Pig is a platform for analyzing large data sets that provides a high-level language (Pig Latin) for expressing data analysis programs. It compiles Pig Latin programs into MapReduce/Tez jobs and runs them on Hadoop clusters.
API Rating
APIs
Apache Pig
Pig provides the Pig Latin scripting language for data analysis, an embedded Pig API for programmatic execution from Java, and a UDF (User Defined Function) API for custom data ...
Capabilities
Apache Pig API — Jobs
Apache Pig API — Jobs. 5 operations. Lead operation: Apache Pig List Jobs. Self-contained Naftiko capability covering one Apache Pig business surface.
Run with NaftikoApache Pig API — Scripts
Apache Pig API — Scripts. 1 operations. Lead operation: Apache Pig Validate Script. Self-contained Naftiko capability covering one Apache Pig business surface.
Run with NaftikoFeatures
High-level dataflow language for expressing data transformations
Compiles Pig Latin to MapReduce or Apache Tez execution plans
User-defined functions in Java, Python, JavaScript, and Ruby
Process data through external programs using STREAM operator
Flexible schema handling for semi-structured data
Automatic logical and physical plan optimization
Use Cases
Build data transformation pipelines from raw logs to structured data
Analyze large datasets with ad-hoc Pig Latin queries
Clean and prepare data for machine learning workflows
Process and aggregate web server and application logs
Integrations
Native MapReduce execution on YARN/HDFS
High-performance Tez execution engine support
HBase storage handler for reading/writing HBase tables
HCatalog integration for Hive metastore access
S3 input/output for cloud-based data processing