Map Reduce With Hadoop:: Presented by ANIVESHA-126 ARITRA-128 RIA-142 Shashvat - 150 SHEKHAR-151

You might also like

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 9

Map Reduce With

Hadoop
:
PRESENTED BY
ANIVESHA-126
ARITRA- 128
RIA-142
SHASHVAT- 150
SHEKHAR-151
Introduction

 Everyday data is generated and therefore, Petabyte to Exabyte and zettabytes.


 With the advancement of technology there is high speed which help in organizing
large amount of data and store it in a organized way.
 “Big data” is a buzz word nowadays because of its popularity of providing the
capacity to process data of various formats and structures without the worry of
data loss.
IMPORTANCE  Big Data is data generated at high speeds in
large volumes on various technological
devices globally. The data can be structured
or unstructured.
 It refers to the data generated every second
by social media networks, sensors, mobiles.
 The Big Data is often broken into 5 V’s
which are Variety, Volume , Velocity,
Veracity, Value to make out logical sense of
the large amounts of data.
 Along with these 5 V’s there also exists
Ambiguity, Viscosity, Virality.
Big – Data Pillars:

 Big Table – has relational tables.


 Big Text – consists of text in the form of structured, semi-structured data, natural
language, and semantic data.
 Big Metadata – collects and stores the data about data stored in big data.
 Big Graphs – Graphs include connections between objects, their semantic
discovery, and the degree of separation, linguistic analytics, and subject predicates.
Map-Reduce
 Map Reduce is one of the emerging programming
paradigm which is designed for processing large volume
of data in parallel mode by splitting the job into various
tasks independently.
 A Map Reduce program is a combination of a function
and a Reduce function.
 The job of Map is to perform filtering and sorting
operations as such, sorting customers by first name into
queues and by generating one queue for each name and
the Reduce performs a summary/aggregate operations
likecounting the number of customers in each queue,
thereby resulting the name counts.
Hadoop

 HADOOP is a framework used to develop data processing applications


which are executed in a distributed computing environment
 This computational logic is nothing, but a compiled version of a
program written in a high-level language such as Java. Such a
program, processes data stored in Hadoop HDFS. HADOOP is an open
source software framework.
 Computer cluster consists of a set of multiple processing units (storage
disk + processor) which are connected to each other and acts as a
single system
 Hadoop consists of two sub-projects –
 Hadoop MapReduce
 HDFS (Hadoop Distributed File System)
Map-Reduce Implementation

 In Map Reduce programs the user may not specify the mappers as it
depends on the size of the file and the block size,where as the
number of reducers can be configured by user based on number of
mappers.
 When multiple mappers are running there can be a situation where
some mappers may be running very slow then Hadoop comes into
a picture and identifies such slow running jobs and triggers the same
job to other data node, this job is named as “Speculator
 execution in Hadoop”.
Conclusion

 In the era of advancement of technology one need to understand


the global competition and big data analysis which help in decision
making.
 Big Data is at infancy stage so there is need to understand the term
“Big Data” before implementatation
 And we can conclude that Big Data will definitely bring social
change through programming language (i.e. SPSS).
 There is need to exploit Big Data Analytics for sustainable and
unbiased society.

You might also like