To Memorize

Uploaded by

merazga ammar

0% found this document useful (0 votes)

5 views2 pages

Copyright

Available Formats

DOCX, PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

0% found this document useful (0 votes)

5 views2 pages

To Memorize

Uploaded by

merazga ammar

Copyright:

Available Formats

Download as DOCX, PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as docx, pdf, or txt

Jump to Page

You are on page 1of 2

Search inside document

-On the one hand, you may have a Hadoop cluster that used for the procession and

storing large amounts of data, in the other hand, you have an application that producing a large
of amount of data or you have a legacy system that storing data in a relational database.
how do you connect these two? That's exactly where Flume and Sqoop coming .

Generally the data where come from two kinds of sources: either it is an application that
produces data in regular bases. or a traditional "Relational database management system" like
for example "Oracle DB, SQL Server..." in both cases we have sources which contains data and
you have a destination which is a Hadoop ecosystem data store....

The question now is how do we get our data from these sources to Hadoop ?
Of course you will say after the introduction Flume and Sqoop . but let explain how this process
is done Or in the absence of these tools, what is the steps to do ...

Normally , all the Hadoop ecosystem technology . exposes Java APIs (application programming
interface), you can directly use these APIs to write data to for example to HDFS , HBase
Cassandra ...

But there are a few reasons why that may be a different problem , based on if you are
transforming data :form an application or if you are bulk transforming data such as RDBMS.

Let start with application : let suppose that we have a number of events that produce data for
this application which needs to be stored as the events occur . this is called streaming data . so
to do that:

because the HDFS files have to be large to take advantage of it distributed architecture . it
means buffer the data in memory or in an intermediate file before writing to HDFS

you should not lose any data even if there is a crash , and we need a guarantee so that no data
be lost.

All these difficulties and problems are then overridden by Flume

also there are a few problems with using RDBMS and directly integrating with a Java API .
Let say that you have a legacy system used RDBMS and you want to port your data from it to
HDFS .

Fortunately, we do not need to think about any option we'll choose thanks to Sqoop .

There role in the Hadoop ecosystem is slightly similar . but they used cases for each with be
slightly different.

the first different is:

In computer science, a data buffer (or just buffer) is a region
of a physical memory storage used to temporarily store data while it is
being moved from one place to another.

Hadoop Interview Questions New
Document9 pages
Hadoop Interview Questions New
Rupali Shetty
No ratings yet
Presentation of Big Data
Document4 pages
Presentation of Big Data
merazga ammar
No ratings yet
What Is The Hadoop Ecosystem
Document5 pages
What Is The Hadoop Ecosystem
Zahra Mea
No ratings yet
2 Hadoop
Document20 pages
2 Hadoop
YASH PRAJAPATI
No ratings yet
Hadoop
Document7 pages
Hadoop
Mayank Rai
No ratings yet
Hadoop Ecosystem
Document16 pages
Hadoop Ecosystem
poojan thakkar
No ratings yet
Getting Started With HDP Sandbox
Document107 pages
Getting Started With HDP Sandbox
risdianto sigma
No ratings yet
Getting Started With Hadoop
Document47 pages
Getting Started With Hadoop
TeeMan27
No ratings yet
Essential Hadoop Tools: Module - 2 Session - 2
Document6 pages
Essential Hadoop Tools: Module - 2 Session - 2
Vasanth Kumar
No ratings yet
Hadoop Ecosystem
Document55 pages
Hadoop Ecosystem
nehal
No ratings yet
Bda Summer 2022 Solution
Document30 pages
Bda Summer 2022 Solution
Vivek
No ratings yet
BDA Notes
Document25 pages
BDA Notes
mrudula.sb
No ratings yet
Sqoop & Flume: Issues With Data Load Into Hadoop
Document6 pages
Sqoop & Flume: Issues With Data Load Into Hadoop
Rohit Uppal
No ratings yet
Mapreduce
Document15 pages
Mapreduce
manasa
No ratings yet
Technical Seminar
Document32 pages
Technical Seminar
Sda Sdasd
No ratings yet
Unit II Big Data
Document27 pages
Unit II Big Data
rohitmarale77
No ratings yet
Introduction To Hadoop
Document5 pages
Introduction To Hadoop
Hanumanthu Gouthami
No ratings yet
Big Data - Unit 2 Hadoop Framework
Document19 pages
Big Data - Unit 2 Hadoop Framework
Aditya Deshpande
No ratings yet
Hadoop Ecosystem PDF
Document55 pages
Hadoop Ecosystem PDF
Rishabh Gupta
No ratings yet
Hadoop Ecosystem PDF
Document55 pages
Hadoop Ecosystem PDF
Rishabh Gupta
No ratings yet
Basic Hadoop Interview Questionsxyzz
Document18 pages
Basic Hadoop Interview Questionsxyzz
shubham rathod
No ratings yet
CC-KML051-Unit V
Document17 pages
CC-KML051-Unit V
Fdjs
No ratings yet
4 UNIT-4 Introduction To Hadoop
Document154 pages
4 UNIT-4 Introduction To Hadoop
PrakashRameshGadekar
No ratings yet
Hadoop Introduction PDF
Document3 pages
Hadoop Introduction PDF
Tahseef Reza
No ratings yet
Hadoop Interview1
Document27 pages
Hadoop Interview1
paramreddy2000
No ratings yet
Apache Flume: Distributed Log Collection For Hadoop - Second Edition - Sample Chapter
Document13 pages
Apache Flume: Distributed Log Collection For Hadoop - Second Edition - Sample Chapter
Packt Publishing
No ratings yet
Hadoop Interview Questions
Document28 pages
Hadoop Interview Questions
jey011851
No ratings yet
Big Data Capsule PDF
Document12 pages
Big Data Capsule PDF
Kavya Kharbanda
No ratings yet
Unit 3
Document61 pages
Unit 3
Ramstage Testing
No ratings yet
An Introduction To Hadoop
Document12 pages
An Introduction To Hadoop
arjun.ec633
No ratings yet
Unit 3
Document15 pages
Unit 3
xcgfxgvx
No ratings yet
Haddob Lab Report
Document12 pages
Haddob Lab Report
Magneto Eric Apollyon Thorn
No ratings yet
BigDataProcessingTools HaddopHDFSHiveSpark
Document2 pages
BigDataProcessingTools HaddopHDFSHiveSpark
Henrique Santos
No ratings yet
Big Data
Document16 pages
Big Data
roushan singh
No ratings yet
Hadoop Big Data: Follow This Link To Know About Features of Hadoop
Document85 pages
Hadoop Big Data: Follow This Link To Know About Features of Hadoop
mvdurgadevi
No ratings yet
Apache Hadoop: Developer(s) Stable Release Preview Release
Document5 pages
Apache Hadoop: Developer(s) Stable Release Preview Release
nitesh_mps
No ratings yet
Big Data Analytics With Hadoop and Apache Spark
Document17 pages
Big Data Analytics With Hadoop and Apache Spark
Fernando Andrés Hinojosa Villarreal
No ratings yet
Edureka Interview Questions - HDFS
Document4 pages
Edureka Interview Questions - HDFS
varunpratap
No ratings yet
Hadoop
Document7 pages
Hadoop
Anonymous mFO6slhI0
No ratings yet
Hadoop Interview Questions Answers
Document5 pages
Hadoop Interview Questions Answers
Anuj Harshwardhan Sharma
No ratings yet
h13999 Hadoop Ecs Data Services WP
Document9 pages
h13999 Hadoop Ecs Data Services WP
Vijay Reddy
No ratings yet
Research Paper On Apache Hadoop
Document6 pages
Research Paper On Apache Hadoop
soezsevkg
100% (1)
Bda Lab Manual
Document40 pages
Bda Lab Manual
vishalatdwork573
0% (1)
Apache Flume - Data Transfer in Hadoop - Tutorialspoint
Document2 pages
Apache Flume - Data Transfer in Hadoop - Tutorialspoint
Mario Soares
No ratings yet
Pig Vs Hive VS Native Map Reduc E: Pangool
Document6 pages
Pig Vs Hive VS Native Map Reduc E: Pangool
kumar
No ratings yet
Big Data Analysis Using Hadoop Mapreduce
Document6 pages
Big Data Analysis Using Hadoop Mapreduce
AJER JOURNAL
100% (1)
Hadoop
Document11 pages
Hadoop
Inu Kag
No ratings yet
Subtitle
Document2 pages
Subtitle
Tai Nguyen
No ratings yet
Compare Hadoop & Spark Criteria Hadoop Spark
Document18 pages
Compare Hadoop & Spark Criteria Hadoop Spark
dasari ramya
No ratings yet
Apache Hadoop
Document11 pages
Apache Hadoop
Imaad Ukaye
No ratings yet
Big Data Introduction & Ecosystems
Document4 pages
Big Data Introduction & Ecosystems
Harish Ch
No ratings yet
Hadoop & Big Data
Document36 pages
Hadoop & Big Data
Paresh Bhatia
No ratings yet
DC Hadoop
Document48 pages
DC Hadoop
SHREENIWAS LAXMANRAO CHATE
No ratings yet
BDA Experiment 14 PDF
Document77 pages
BDA Experiment 14 PDF
Nikita Ichale
No ratings yet
L Hadoop 1 PDF
Document12 pages
L Hadoop 1 PDF
Dao Van Hang
No ratings yet
CSE Hadoop Report
Document14 pages
CSE Hadoop Report
rohit
No ratings yet
Hadoop Backup and Recovery
Document14 pages
Hadoop Backup and Recovery
Cosme
No ratings yet
Hadoop Ecosystem
Document56 pages
Hadoop Ecosystem
RUGAL NEEMA MBA 2021-23 (Delhi)
No ratings yet
02 Unit-II Hadoop Architecture and HDFS
Document18 pages
02 Unit-II Hadoop Architecture and HDFS
KumarAdabala
No ratings yet
PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond
From Everand
PolyBase Revealed: Data Virtualization with SQL Server, Hadoop, Apache Spark, and Beyond
Kevin Feasel
No ratings yet