Apache Sqoop logo

Apache Sqoop

Apache Sqoop is a command-line tool designed for efficiently transferring bulk data between Apache Hadoop and structured data stores such as relational databases. It supports parallel import and export of data, incremental loads, and direct database connectors for MySQL, PostgreSQL, Oracle, SQL Server, and DB2. Note: Apache Sqoop has been retired to the Apache Attic as of 2021. Users are encouraged to migrate to Apache Spark or Apache NiFi.

1 APIs 5 Features
Big DataData TransferETLHadoopRDBMSRetired

APIs

Apache Sqoop CLI

Apache Sqoop provides a command-line interface for bulk data transfer between Hadoop and relational databases. Commands include sqoop-import for loading data into HDFS or Hive, ...

Features

Bulk Import

High-throughput parallel import from RDBMS to HDFS, Hive, or HBase.

Bulk Export

Export data from HDFS back to relational database tables.

Incremental Loads

Delta-based incremental loading using append or lastmodified strategies.

Direct Import Mode

Native database utility-based transfers for MySQL and PostgreSQL.

Hive Integration

Auto-create Hive tables and load imported data directly into Hive.

Use Cases

Data Warehouse Loading

Load relational database data into Hadoop-based data warehouses.

Database Offloading

Move historical data from RDBMS to HDFS for cost-effective storage.

Integrations

Apache Hadoop

Primary target storage for Sqoop imports via HDFS.

Apache Hive

Create and populate Hive tables from RDBMS imports.

MySQL

MySQL JDBC and direct mysqldump-based connector.

Oracle

Oracle JDBC connector for enterprise database data transfer.

Resources

👥
GitHubRepository
GitHubRepository
🔗
Documentation
Documentation
🌐
Portal
Portal
📜
TermsOfService
TermsOfService