Apache Sqoop
Apache Sqoop is a command-line tool designed for efficiently transferring bulk data between Apache Hadoop and structured data stores such as relational databases. It supports parallel import and export of data, incremental loads, and direct database connectors for MySQL, PostgreSQL, Oracle, SQL Server, and DB2. Note: Apache Sqoop has been retired to the Apache Attic as of 2021. Users are encouraged to migrate to Apache Spark or Apache NiFi.
APIs
Apache Sqoop CLI
Apache Sqoop provides a command-line interface for bulk data transfer between Hadoop and relational databases. Commands include sqoop-import for loading data into HDFS or Hive, ...
Features
High-throughput parallel import from RDBMS to HDFS, Hive, or HBase.
Export data from HDFS back to relational database tables.
Delta-based incremental loading using append or lastmodified strategies.
Native database utility-based transfers for MySQL and PostgreSQL.
Auto-create Hive tables and load imported data directly into Hive.
Use Cases
Load relational database data into Hadoop-based data warehouses.
Move historical data from RDBMS to HDFS for cost-effective storage.
Integrations
Primary target storage for Sqoop imports via HDFS.
Create and populate Hive tables from RDBMS imports.
MySQL JDBC and direct mysqldump-based connector.
Oracle JDBC connector for enterprise database data transfer.