Notes Bug Data and of Apache

You might also like

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

The Hadoop ecosystem consists of various facets specific to different career

specialties. One such discipline centers around Sqoop, which is a tool in


the Hadoop ecosystem used to load data from relational database management
systems (RDBMS) to Hadoop and export it back to the RDBMS. Simply put, Sqoop
helps professionals work with large amounts of data in Hadoop.

This Sqoop tutorial gives you an in-depth walkthrough for using the Sqoop tool in
Hadoop to manage Big Data. It digs into everything from the basics of Sqoop and its
architecture, to how to actually use it.

What is Sqoop and Why Use Sqoop?

Let us begin this Sqoop tutorial by understanding about Sqoop. Sqoop is a tool used
to transfer bulk data between Hadoop and external datastores, such as relational
databases (MS SQL Server, MySQL).

To process data using Hadoop, the data first needs to be loaded into Hadoop
clusters from several sources. However, it turned out that the process of loading
data from several heterogeneous sources was extremely challenging. The problems
administrators encountered included:

1. Maintaining data consistency

2. Ensuring efficient utilization of resources

3. Loading bulk data to Hadoop was not possible


4. Loading data using scripts was slow

The solution was Sqoop. Using Sqoop in Hadoop helped to overcome all the
challenges of the traditional approach and it could load bulk data from RDBMS to
Hadoop with ease.

Now that we've understood about Sqoop and the need for Sqoop, as the next topic in
this Sqoop tutorial, let's learn the features of Sqoop.

Sqoop Features

Sqoop has several features, which makes it helpful in the Big Data world:

1. Parallel Import/Export

Sqoop uses the YARN framework to import and export data. This provides
fault tolerance on top of parallelism.

2. Import Results of an SQL Query

Sqoop enables us to import the results returned from an SQL query into
HDFS.

3. Connectors For All Major RDBMS Databases

Sqoop provides connectors for multiple RDBMSs, such as the MySQL and
Microsoft SQL servers.

4. Kerberos Security Integration

Sqoop supports the Kerberos computer network authentication protocol,


which enables nodes communication over an insecure network to
authenticate users securely.
5. Provides Full and Incremental Load

Sqoop can load the entire table or parts of the table with a single
command.
After going through the features of Sqoop as a part of this Sqoop tutorial, let us
understand the Sqoop architecture.

Sqoop Architecture

Now, let’s dive deep into the architecture of Sqoop, step by step:

1. The client submits the import/ export command to import or export data.

2. Sqoop fetches data from different databases. Here, we have an enterprise data
warehouse, document-based systems, and a relational database. We have a
connector for each of these; connectors help to work with a range of accessible
databases.

3. Multiple mappers perform map tasks to load the data on to HDFS.


4. Similarly, numerous map tasks will export the data from HDFS on to RDBMS using
the Sqoop export command.

This Sqoop tutorial now gives you an insight of the Sqoop import.

Sqoop Import

The diagram below represents the Sqoop import mechanism.

1. In this example, a company’s data is present in the RDBMS. All this


metadata is sent to the Sqoop import. Scoop then performs an
introspection of the database to gather metadata (primary key
information).

2. It then submits a map-only job. Sqoop divides the input dataset into splits
and uses individual map tasks to push the splits to HDFS.

Few of the arguments used in Sqoop import are shown below:

In this Sqoop tutorial, you have learned about the Sqoop import, now let's dive in to
understand the Sqoop export.

Sqoop Export

Let us understand the Sqoop export mechanism stepwise:

1. The first step is to gather the metadata through introspection.


2. Sqoop then divides the input dataset into splits and uses individual map
tasks to push the splits to RDBMS.

Let’s now have a look at few of the arguments used in Sqoop export:

After understanding the Sqoop import and export, the next section in this Sqoop
tutorial is the processing that takes place in Sqoop.

You might also like