Welcome to Scribd!

Skip carousel

Unit III

Uploaded by

Vijay Sasan M 21IT

0% found this document useful (0 votes)

1 views8 pages

Original Title

unit-iii

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

1 views8 pages

Unit III

Uploaded by

Vijay Sasan M 21IT

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 8

Search inside document

lOMoARcPSD|37278361

UNIT III

Big Data Analytics (Anna University)

Scan to open on Studocu

Studocu is not sponsored or endorsed by any college or university

Downloaded by Vijay Sasan M 21IT (vijaysasan.25it@licet.ac.in)
lOMoARcPSD|37278361

UNIT III MAP REDUCE APPLICATIONS 6

MapReduce workflows – unit tests with MRUnit – test data and local tests – anatomy of
MapReduce job run – classic Map-reduce – YARN – failures in classic Map-reduce and YARN
– job scheduling – shuffle and sort – task execution – MapReduce types – input formats – output
formats.

Why MapReduce?

Traditional model is certainly not suitable to process huge volumes of scalable

data and cannot be accommodated by standard database servers.

Map Reduce divides a task into small parts and assigns them to many
computers. Later, the results are collected at one place and integrated to form
the result dataset.
.

Fig 3.1

Fig 3.2

Downloaded by Vijay Sasan M 21IT (vijaysasan.25it@licet.ac.in)

lOMoARcPSD|37278361

How MapReduce Works?

The Map Reduce algorithm contains two important tasks, namely Map and
Reduce.

The Map task takes a set of data and converts it into another set of data, where
individual elements are broken down into tuples (key- value pairs).

The Reduce task takes the output from the Map as an input and combines
those data tuples (key-value pairs) into a smaller set of tuples.

Input Phase−Here we have a Record Reader that translates each record

in an input file.

Downloaded by Vijay Sasan M 21IT (vijaysasan.25it@licet.ac.in)

lOMoARcPSD|37278361

Map − Map is a user-defined function, which takes a series of key-

value pairs and processes.

Intermediate Keys − The key-value pairs generated by the mapper are

known as intermediate keys.

Combiner−A combiner is a type of local Reducer that groups similar

data from the map phase into identifiable sets.

Shuffle and Sort−The Reducer task starts with the Shuffle and Sort
step.

Reducer − The Reducer takes the grouped key-value paired data as

input and runs a Reducer function on each one of them.

Output Phase − In the output phase, we have an output

formatter that translates the final key-value pairs from the
Reducer function and writes them onto a file using a record
writer.

Let us try to understand the two tasks Map & Reduce with
the help of a small diagram.

Downloaded by Vijay Sasan M 21IT (vijaysasan.25it@licet.ac.in)

lOMoARcPSD|37278361

Fig : MapReduce Task

Let us take a real-world example to comprehend the power

of MapReduce.

Twitter receives around 500 million tweets per day, which

are nearly 3000 tweets per second.

The following illustration shows how Twitter manages its

tweets with the help of Map Reduce.

As shown in the illustration, the MapReduce algorithm

performs the following actions

Tokenize

Downloaded by Vijay Sasan M 21IT (vijaysasan.25it@licet.ac.in)

lOMoARcPSD|37278361

Filter

Count

Aggregate Counters

MapReduce Algortihms

The Map Reduce algorithm contains two important

tasks,namely Map and Reduce.

The map task is done by means of Mapper Class.

The reduce task is done by means of Reducer Class.

Downloaded by Vijay Sasan M 21IT (vijaysasan.25it@licet.ac.in)

lOMoARcPSD|37278361

Mapper class takes the input,tokenizes it,maps,and

sorts it.

Fig. MapReduce Algorithms

• Map Reduce implements various mathematical

algorithms to divide a task into small parts and
assign them to multiple systems.

• In technical terms, Map Reduce algorithm helps

in sending the Map & Reduce tasks to appropriate
servers in a cluster.

Downloaded by Vijay Sasan M 21IT (vijaysasan.25it@licet.ac.in)

lOMoARcPSD|37278361

These mathematical algorithms may include the

following−
 Sorting
 Searching
 Indexing
 TF-IDF

Downloaded by Vijay Sasan M 21IT (vijaysasan.25it@licet.ac.in)

Q1: GIS Definition: There Are Many Definitions Describe GIS Properties
Document11 pages
Q1: GIS Definition: There Are Many Definitions Describe GIS Properties
qe wr
No ratings yet
Python API Assignment - Overview
Document2 pages
Python API Assignment - Overview
dd
0% (1)
Medha 8059
Document4 pages
Medha 8059
jefferyleclerc
No ratings yet
UNIT 3bda
Document16 pages
UNIT 3bda
Nagur Basha
No ratings yet
Paper Summary - MapReduce - Simplified Data Processing On Large Clusters (2004) - MeloSpace
Document7 pages
Paper Summary - MapReduce - Simplified Data Processing On Large Clusters (2004) - MeloSpace
Edward
No ratings yet
BDA-MapReduce (1) 5rfgy656yhgvcft6
Document60 pages
BDA-MapReduce (1) 5rfgy656yhgvcft6
Ayush Jha
No ratings yet
Practical 1: Data Mining and Business Intelligence Practical-1
Document10 pages
Practical 1: Data Mining and Business Intelligence Practical-1
Dhrupad Patel
No ratings yet
Unit4 Fos
Document7 pages
Unit4 Fos
aswini.ran98
No ratings yet
Map Reduce Workflow Colloquim
Document30 pages
Map Reduce Workflow Colloquim
Jatin Parashar
No ratings yet
Map Reduce Tutorial-1
Document7 pages
Map Reduce Tutorial-1
jefferyleclerc
No ratings yet
Bda Expt5 - 60002190056
Document5 pages
Bda Expt5 - 60002190056
kr
No ratings yet
Unit Ii Iintroduction To Map Reduce
Document4 pages
Unit Ii Iintroduction To Map Reduce
kathirdcn
No ratings yet
Research Paper - Map Reduce - CSC3323
Document16 pages
Research Paper - Map Reduce - CSC3323
AliElabridi
No ratings yet
Adobe Scan 03 Jun 2024
Document3 pages
Adobe Scan 03 Jun 2024
219X1A3252 POTHU SAI ESWAR REDDY
No ratings yet
BigData MapReduce
Document6 pages
BigData MapReduce
arjuncchaudhary
100% (1)
Big Data Analytics (2171607) : Chapter - 1 Mapreduce
Document32 pages
Big Data Analytics (2171607) : Chapter - 1 Mapreduce
Dhana Lakshmi Boomathi
No ratings yet
Activity-1: Student Name USN
Document16 pages
Activity-1: Student Name USN
Trishala Kumari
No ratings yet
BDA Unit 3 Notes
Document11 pages
BDA Unit 3 Notes
duraisamy
No ratings yet
What Is MapReduce
Document6 pages
What Is MapReduce
Sundaram yadav
No ratings yet
Module 3 (Part-1) - Big Data
Document46 pages
Module 3 (Part-1) - Big Data
sujith
No ratings yet
Symmetry: HMCTS-OP: Hierarchical MCTS Based Online Planning in The Asymmetric Adversarial Environment
Document17 pages
Symmetry: HMCTS-OP: Hierarchical MCTS Based Online Planning in The Asymmetric Adversarial Environment
Jordi Amador
No ratings yet
Line Scan Conversion in Computer Graphics
Document48 pages
Line Scan Conversion in Computer Graphics
Abhishek Singh Chambial
No ratings yet
Precise Fingerprint Enrolment Through Projection Incorporated Subspace Based On Principal Component Analysis (PCA)
Document7 pages
Precise Fingerprint Enrolment Through Projection Incorporated Subspace Based On Principal Component Analysis (PCA)
Md. Rajibul Islam
100% (1)
Bda Experiment 5: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
Document5 pages
Bda Experiment 5: Roll No. A-52 Name: Janmejay Patil Class: BE-A Batch: A3 Date of Experiment: Date of Submission Grade
Alka
No ratings yet
Big Data BCA Unit4
Document9 pages
Big Data BCA Unit4
kuldeep
No ratings yet
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
Document36 pages
(BIG DATA) (MapReduce - Quick Guide, Tutorialspoint - Com)
Mony Simonetti
No ratings yet
Image Formation For Synthetic Aperture Satellite Using Matlab
Document4 pages
Image Formation For Synthetic Aperture Satellite Using Matlab
Ashok Reddy
No ratings yet
Unit 3 - Big Data Technologies
Document42 pages
Unit 3 - Big Data Technologies
prakash N
No ratings yet
Hierarchical Mapreduce Programming Model and Scheduling Algorithms
Document6 pages
Hierarchical Mapreduce Programming Model and Scheduling Algorithms
Sudhansu Shekhar Patra
No ratings yet
Map-Reduce Pattern
Document6 pages
Map-Reduce Pattern
afeefhamza2008
No ratings yet
International Journal of Computational Engineering Research (IJCER)
Document6 pages
International Journal of Computational Engineering Research (IJCER)
International Journal of computational Engineering research (IJCER)
No ratings yet
Unit 3 MapReduce
Document15 pages
Unit 3 MapReduce
AKSHAY Kumar
No ratings yet
Cad&Cam
Document219 pages
Cad&Cam
sharmaashlesha28
No ratings yet
Usersguide 1 Intro
Document13 pages
Usersguide 1 Intro
GabrielIgnacioFuentealbaUmanzor
No ratings yet
Comparison of Rendering Processes On 3D Model
Document10 pages
Comparison of Rendering Processes On 3D Model
Anonymous Gl4IRRjzN
No ratings yet
Assignment No 1dda-Line
Document4 pages
Assignment No 1dda-Line
Shrikant Navale
No ratings yet
Hadoop (Mapreduce)
Document43 pages
Hadoop (Mapreduce)
Nisrine Mofakir
No ratings yet
BDA Module 3
Document66 pages
BDA Module 3
yashchheda2002
No ratings yet
Hadoop and Big Data Unit 31
Document9 pages
Hadoop and Big Data Unit 31
pnirmal086
No ratings yet
Krishnan DEM MR
Document8 pages
Krishnan DEM MR
xico107
No ratings yet
18mcs35e U4
Document7 pages
18mcs35e U4
jefferyleclerc
No ratings yet
CG Mod3@AzDOCUMENTS - in PDF
Document45 pages
CG Mod3@AzDOCUMENTS - in PDF
sangamesh k
No ratings yet
Programming With Mathcad Prime
Document13 pages
Programming With Mathcad Prime
Claudiu Popescu
67% (3)
Lidar Final
Document46 pages
Lidar Final
sanda ana
No ratings yet
Training Lidar Data Processing
Document31 pages
Training Lidar Data Processing
sanda ana
No ratings yet
Mla 3
Document5 pages
Mla 3
Renat Zhamilov
No ratings yet
Fundamentals of MapReduce With Example
Document2 pages
Fundamentals of MapReduce With Example
Sanidhya Singh Rajawat
No ratings yet
Unit 3 MapReduce Part 1
Document12 pages
Unit 3 MapReduce Part 1
Ruparel Education Pvt. Ltd.
No ratings yet
Map Reduce
Document45 pages
Map Reduce
mbhavya5867
No ratings yet
Data Science
Document7 pages
Data Science
aswini.ran98
No ratings yet
Cryptography Tutorial
Document57 pages
Cryptography Tutorial
Mriyank Das
No ratings yet
MapReduceModel 5.0
Document31 pages
MapReduceModel 5.0
Lovish Jaiswal
No ratings yet
Practical 2
Document8 pages
Practical 2
HARRY
No ratings yet
SE Assignment
Document18 pages
SE Assignment
Nasir Abbas
No ratings yet
MAPREDUCEFRAMEWORK
Document12 pages
MAPREDUCEFRAMEWORK
gracesachinrock
No ratings yet
SMQ3043 Assignment Article
Document15 pages
SMQ3043 Assignment Article
Noor Hidayah Mohd Kamal
No ratings yet
MapReduce Patterns, Algorithms, and Use Cases - Highly Scalable Blog
Document7 pages
MapReduce Patterns, Algorithms, and Use Cases - Highly Scalable Blog
jefferyleclerc
No ratings yet
On Traffic-Aware Partition and Aggregation in MapReduce For Big Data Applications
Document12 pages
On Traffic-Aware Partition and Aggregation in MapReduce For Big Data Applications
Surendra Naik
No ratings yet
A Survey On Cloud Computing
Document14 pages
A Survey On Cloud Computing
Dr. P. D. Gawande
No ratings yet
Rendering Computer Graphics: Exploring Visual Realism: Insights into Computer Graphics
From Everand
Rendering Computer Graphics: Exploring Visual Realism: Insights into Computer Graphics
Fouad Sabry
No ratings yet
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
From Everand
Bundle Adjustment: Optimizing Visual Data for Precise Reconstruction
Fouad Sabry
No ratings yet
Mugged Up
Document28 pages
Mugged Up
Nev Blast
No ratings yet
Grossman 2020 Base de Datos de GVS
Document15 pages
Grossman 2020 Base de Datos de GVS
Gustavo Arellano
No ratings yet
Scheme of Study BS (IT) 2019-23
Document21 pages
Scheme of Study BS (IT) 2019-23
Rashid Mehmood
No ratings yet
SIDSAS94 64bit-2022
Document3 pages
SIDSAS94 64bit-2022
Aldo Martín Fernández
No ratings yet
Assignment Cover Sheet: Sthapa@ismt - Edu.np
Document12 pages
Assignment Cover Sheet: Sthapa@ismt - Edu.np
sa dew
No ratings yet
C Tadm 22
Document4 pages
C Tadm 22
Muhammad dhani
No ratings yet
True or False:: False False True True True False True False False False
Document4 pages
True or False:: False False True True True False True False False False
Siwar Alasmar
No ratings yet
Data Mining: Dosen: Dr. Vitri Tundjungsari
Document64 pages
Data Mining: Dosen: Dr. Vitri Tundjungsari
Arif Prayogi
No ratings yet
2016-Db-Marcin Przepiorowski-Rman - From Beginner To Advanced-Praesentation
Document65 pages
2016-Db-Marcin Przepiorowski-Rman - From Beginner To Advanced-Praesentation
Truong Vu Van
No ratings yet
(A Central University) : Department of Csit
Document38 pages
(A Central University) : Department of Csit
Ajit Tiwari
No ratings yet
Covariance Search Model For Identifying ncRNA Using Particle Swarm Optimized Agglomerative Clustering
Document9 pages
Covariance Search Model For Identifying ncRNA Using Particle Swarm Optimized Agglomerative Clustering
Satrya Fajri Pratama
No ratings yet
Reflex User Manual: European Southern Observatory
Document36 pages
Reflex User Manual: European Southern Observatory
Ciprian Balcan
No ratings yet
Data Science Practical
Document55 pages
Data Science Practical
Darshan Panchal
No ratings yet
FLEXSIM A Flexible Manufacturing System Simulator
Document18 pages
FLEXSIM A Flexible Manufacturing System Simulator
Nguyễn Ngọc Quang
No ratings yet
All API
Document18 pages
All API
ajay ware
No ratings yet
Oracle PL-SQL Course Content
Document4 pages
Oracle PL-SQL Course Content
Avinash Singh
No ratings yet
CLL F041 Ar TRM Eng
Document79 pages
CLL F041 Ar TRM Eng
arleyvc
No ratings yet
Oracle Developer 2000 Training
Document4 pages
Oracle Developer 2000 Training
Satheessh Konthala
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 3
Document52 pages
Data Mining: Concepts and Techniques: - Chapter 3
Jay Shah
No ratings yet
Oracle Machine Learning Python Users Guide
Document370 pages
Oracle Machine Learning Python Users Guide
Harish Naik
No ratings yet
Querying With T-SQL - 01
Document24 pages
Querying With T-SQL - 01
kishore2285
No ratings yet
Fanuc 0 CNC Identification Sheet
Document1 page
Fanuc 0 CNC Identification Sheet
anthony soliz
No ratings yet
Requirement Gathering Template
Document4 pages
Requirement Gathering Template
shubham tiwari
No ratings yet
04 - A Framework of Cloud-Computing-Based BIM Service For Building Lifecycle
Document8 pages
04 - A Framework of Cloud-Computing-Based BIM Service For Building Lifecycle
Duvan Fonseca
No ratings yet
STM PDF
Document6 pages
STM PDF
Cristopher Rico Delgado
No ratings yet
Talend Open Studio For Data Integration
Document436 pages
Talend Open Studio For Data Integration
caner g
No ratings yet
Human Face Recognition Attendance System
Document17 pages
Human Face Recognition Attendance System
Anish Aruna
No ratings yet
DSAL List of Laboratory Experiments
Document2 pages
DSAL List of Laboratory Experiments
Aniruddh Ghewade
0% (1)
Cognos 8 Bi - Data Manager - Function and Scripting Reference Guide
Document134 pages
Cognos 8 Bi - Data Manager - Function and Scripting Reference Guide
sami6114
100% (1)