Apache Hive logo

Apache Hive

Apache Hive is a data warehouse software that facilitates reading, writing, and managing large datasets residing in distributed storage using SQL. It provides a SQL-like interface called HiveQL for querying data stored in Hadoop, along with a WebHCat REST API for job submission and metastore access.

2 APIs 1 Capabilities 8 Features
ApacheBig DataData WarehouseETLHadoopOpen SourceSQL

APIs

Apache Hive WebHCat REST API

WebHCat (Templeton) REST API for Apache Hive providing DDL operations, HiveQL job submission, and Hive Metastore metadata access over HTTP.

Apache Hive JDBC API

JDBC interface to HiveServer2 for standard SQL client connectivity, supporting parameterized queries, result sets, and connection pooling from Java and ODBC-bridge applications.

Capabilities

Features

HiveQL SQL Interface

SQL-like query language for reading, writing, and aggregating data stored in distributed storage.

WebHCat REST API

HTTP REST API (Templeton) for DDL operations, job submission, and metastore metadata access.

HiveServer2 JDBC/ODBC

Thrift-based server with JDBC and ODBC drivers for standard SQL client connectivity.

Hive Metastore

Central repository for table schema, partition metadata, and storage location information.

Partitioning

Partition tables by column values for efficient query pruning and data organization.

ORC and Parquet Storage

Optimized columnar storage formats with predicate pushdown and compression support.

ACID Transactions

Full ACID transaction support for inserts, updates, and deletes on managed ORC tables.

Vectorized Query Execution

Batch processing of rows in CPU register-width vectors for improved query throughput.

Use Cases

Data Warehouse Analytics

Run SQL analytics on petabyte-scale datasets stored in HDFS or object storage.

ETL Pipeline Orchestration

Use HiveQL scripts to transform and load data between raw and curated data lake zones.

Ad-Hoc Data Exploration

Query structured data interactively using Beeline or JDBC-connected BI tools.

Log Analysis

Parse and aggregate application logs stored as text or JSON in HDFS using Hive SerDes.

Data Catalog Integration

Use the Hive Metastore as a shared schema registry for Spark, Flink, and Presto.

Integrations

Apache Hadoop HDFS

Hive reads and writes data stored in HDFS as the primary storage layer.

Apache Spark

Spark uses the Hive Metastore for table discovery and supports Hive UDFs.

Apache HBase

Hive HBase storage handler enables HiveQL queries against HBase tables.

Apache Tez

Apache Tez DAG execution engine replaces MapReduce for faster Hive query processing.

Presto / Trino

Presto and Trino use the Hive Metastore for table metadata in federated SQL queries.

Semantic Vocabularies

Apache Hive Webhcat Context

21 classes · 0 properties

JSON-LD

API Governance Rules

Apache Hive API Rules

8 rules · 2 errors 5 warnings 1 info

SPECTRAL

Resources

🔗
Documentation
Documentation
🚀
GettingStarted
GettingStarted
👥
GitHubOrganization
GitHubOrganization
👥
GitHubRepository
GitHubRepository
🔗
SpectralRules
SpectralRules
🔗
Vocabulary
Vocabulary
🔗
NaftikoCapability
NaftikoCapability