Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

UNIT-3

Topic :

Moving the Data from RDBMS to Hadoop , Moving the Data from RDBMS to Hbase , Moving the Data from
RDBMS to Hive

SQOOP– SQOOP is a command-line interface application that helps in transferring data from RDBMS to Hadoop. It is the JDBC-based (Java
DataBase Connectivity) utility for integrating with traditional databases. SQOOP Import allows the movement of data into either HDFS (a delimited
format can be defined as a part of the Import definition) or directly into a Hive table.

Sqoop Architecture & Working


Let us understand how Apache Sqoop works using the below diagram:
The import tool imports individual tables from RDBMS to HDFS. Each row in a table is treated as a record in HDFS.

When we submit Sqoop command, our main task gets divided into subtasks which is handled by individual Map Task internally. Map Task is the

subtask, which imports part of data to the Hadoop Ecosystem. Collectively, all Map tasks imports the whole data.

Export also works in a similar manner.

The export tool exports a set of files from HDFS back to an RDBMS. The files given as input to Sqoop contain records, which are called as rows in the

table.
When we submit our Job, it is mapped into Map Tasks which brings the chunk of data from HDFS. These chunks are exported to a structured data

destination. Combining all these exported chunks of data, we receive the whole data at the destination, which in most of the cases is an RDBMS

(MYSQL/Oracle/SQL Server).

Reduce phase is required in case of aggregations. But, Apache Sqoop just imports and exports the data; it does not perform any aggregations. Map job

launch multiple mappers depending on the number defined by the user. For Sqoop import, each mapper task will be assigned with a part of data to be

imported. Sqoop distributes the input data among the mappers equally to get high performance. Then each mapper creates a connection with the

database using JDBC and fetches the part of data assigned by Sqoop and writes it into HDFS or Hive or HBase based on the arguments provided in the

CLI.
The syntax for Sqoop Import command is:

$ sqoop import (generic-args) (import-args)

$ sqoop-import (generic-args) (import-args)

We can pass import arguments in any order with respect to each other, but the Hadoop generic arguments must precede
the import arguments.

The export command works in two modes- insert mode and update mode.
1. Insert mode: It is the default mode. In this mode, the records from the input files are inserted into the database table by
using the INSERT statement.
2. Update mode: In the update mode, Sqoop generates an UPDATE statement that replaces existing records into the
database.
Syntax for Sqoop Export
The Syntax for Sqoop Export are:

$ sqoop export (generic-args) (export-args)

$ sqoop-export (generic-args) (export-args)

The Hadoop generic arguments should be passed before any export arguments, and we can enter export arguments in any
order with respect to each other.

You might also like