Professional Documents
Culture Documents
Moving The Data From
Moving The Data From
Topic :
Moving the Data from RDBMS to Hadoop , Moving the Data from RDBMS to Hbase , Moving the Data from
RDBMS to Hive
SQOOP– SQOOP is a command-line interface application that helps in transferring data from RDBMS to Hadoop. It is the JDBC-based (Java
DataBase Connectivity) utility for integrating with traditional databases. SQOOP Import allows the movement of data into either HDFS (a delimited
format can be defined as a part of the Import definition) or directly into a Hive table.
When we submit Sqoop command, our main task gets divided into subtasks which is handled by individual Map Task internally. Map Task is the
subtask, which imports part of data to the Hadoop Ecosystem. Collectively, all Map tasks imports the whole data.
The export tool exports a set of files from HDFS back to an RDBMS. The files given as input to Sqoop contain records, which are called as rows in the
table.
When we submit our Job, it is mapped into Map Tasks which brings the chunk of data from HDFS. These chunks are exported to a structured data
destination. Combining all these exported chunks of data, we receive the whole data at the destination, which in most of the cases is an RDBMS
(MYSQL/Oracle/SQL Server).
Reduce phase is required in case of aggregations. But, Apache Sqoop just imports and exports the data; it does not perform any aggregations. Map job
launch multiple mappers depending on the number defined by the user. For Sqoop import, each mapper task will be assigned with a part of data to be
imported. Sqoop distributes the input data among the mappers equally to get high performance. Then each mapper creates a connection with the
database using JDBC and fetches the part of data assigned by Sqoop and writes it into HDFS or Hive or HBase based on the arguments provided in the
CLI.
The syntax for Sqoop Import command is:
We can pass import arguments in any order with respect to each other, but the Hadoop generic arguments must precede
the import arguments.
The export command works in two modes- insert mode and update mode.
1. Insert mode: It is the default mode. In this mode, the records from the input files are inserted into the database table by
using the INSERT statement.
2. Update mode: In the update mode, Sqoop generates an UPDATE statement that replaces existing records into the
database.
Syntax for Sqoop Export
The Syntax for Sqoop Export are:
The Hadoop generic arguments should be passed before any export arguments, and we can enter export arguments in any
order with respect to each other.