Apache HBase
Apache HBase is an open-source, distributed, versioned, non-relational database modeled after Google's Bigtable. It provides random, real-time read/write access to big data and runs on top of Apache Hadoop HDFS, offering a REST API (Stargate), Thrift API, and Java client API for table and cell-level operations.
APIs
Apache HBase REST API
REST API (Stargate) for Apache HBase distributed NoSQL database, providing table management, row and cell operations, and table scanning via HTTP with JSON or XML encoding.
Apache HBase Java Client API
Java client API for all HBase data operations including table administration, filters, coprocessors, batch operations, and async client for high-throughput workloads.
Capabilities
Features
Store sparse, semi-structured data in a distributed wide-column table model inspired by Google Bigtable.
HTTP REST gateway for language-agnostic table and row operations using JSON or XML.
High-performance Thrift interface for cross-language HBase access with compact binary encoding.
Strong consistency guarantees for single-row get, put, and delete operations.
Server-side coprocessor framework for custom observers and endpoints analogous to stored procedures.
JRuby-based interactive shell for administrative and data manipulation operations.
Flexible server-side scan API with filters, time ranges, and column family projections.
Asynchronous multi-cluster replication for disaster recovery and geographic distribution.
Use Cases
Store high-velocity time-series sensor or log data with row keys designed for time range scans.
Persist event streams from web applications or IoT devices for analytics and audit.
Store sparse user profile attributes at scale with efficient random access by user ID.
Use HBase as a backend storage engine for graph databases like Apache TinkerPop/JanusGraph.
Store and serve pre-computed ML features at low latency for online prediction.
Integrations
HBase uses HDFS as its underlying distributed file system for WAL and HFile storage.
SQL skin over HBase providing JDBC access, secondary indexes, and query optimization.
Spark-HBase connector for reading and writing HBase tables as Spark DataFrames.
HBase storage handler for using HBase tables as external Hive tables.
Flink HBase connector for reading and writing HBase tables in streaming pipelines.