Welcome to Scribd!

Skip carousel

Chapter 8

Uploaded by

HEMAMALINI J

0% found this document useful (0 votes)

90 views22 pages

big data analytics-8

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

big data analytics-8

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

90 views22 pages

Chapter 8

Uploaded by

HEMAMALINI J

big data analytics-8

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 22

Search inside document

Big Data

And Analytics

Seema Acharya
Subhashini Chellappan

Big Data and Analytics by Seema Acharya and Subhashini Chellappan

Introduction to MapReduce Programming

Big Data and Analytics by Seema Acharya and Subhashini Chellappan

Learning Objectives Learning Outcomes

Introduction to MapReduce
Programming
a) To comprehend the reasons
1. To study the optimization behind the popularity of
techniques of MapReduce MapReduce Programming.
Programming.
b) To be able to perform
2. To learn the combiner, combiner and partitioner
partitioner technique. program.

3. To study sorting, searching and c) To comprehend searching,

compression. sorting and compression.

Big Data and Analytics by Seema Acharya and Subhashini Chellappan

Lecture time 90 minutes

Q/A 5 minutes

Big Data and Analytics by Seema Acharya and Subhashini Chellappan

 Reducer
 Shuffle
 Sort
 Reduce
 Output Format

 Combiner
 Partitioner
 Searching
 Sorting
 Compression

Big Data and Analytics by Seema Acharya and Subhashini Chellappan

In MapReduce Programming, Jobs (Applications) are split into a set

of map tasks and reduce tasks. Then these tasks are executed in a
distributed fashion on Hadoop cluster.

Each task processes small subset of data that has been assigned to
it. This way, Hadoop distributes the load across the cluster.

MapReduce job takes a set of files that is stored in HDFS (Hadoop

Distributed File System) as input.

Big Data and Analytics by Seema Acharya and Subhashini Chellappan

A mapper maps the input key−value pairs into a set of intermediate key–value
pairs. Maps are individual tasks that have the responsibility of transforming input
records into intermediate key–value pairs.

Mapper Consists of following phases:

• RecordReader

• Map

• Combiner

• Partitioner

Big Data and Analytics by Seema Acharya and Subhashini Chellappan

The primary chore of the Reducer is to reduce a set of intermediate

values (the ones that share a common key) to a smaller set of values.

The Reducer has three primary phases: Shuffle and Sort, Reduce, and
Output Format.

Big Data and Analytics by Seema Acharya and Subhashini Chellappan

It is an optimization technique for MapReduce Job. Generally, the reducer class

is set to be the combiner class. The difference between combiner class and
reducer class is as follows:

• Output generated by combiner is intermediate data and it is passed to the

reducer.

• Output of the reducer is passed to the output file on disk.

Big Data and Analytics by Seema Acharya and Subhashini Chellappan

The partitioning phase happens after map phase and before reduce phase.
Usually the number of partitions are equal to the number of reducers. The
default partitioner is hash partitioner.

Big Data and Analytics by Seema Acharya and Subhashini Chellappan

In MapReduce programming, you can compress the MapReduce output file.

Compression provides two benefits as follows:

1. Reduces the space to store files.

2. Speeds up data transfer across the network.

You can specify compression format in the Driver Program as shown below:

conf.setBoolean("mapred.output.compress",true);
conf.setClass("mapred.output.compression.codec",
GzipCodec.class,CompressionCodec.class);

Here, codec is the implementation of a compression and decompression algorithm.

GzipCodec is the compression algorithm for gzip. This compresses the output file.

Big Data and Analytics by Seema Acharya and Subhashini Chellappan

1. Partitioner phase belongs ------------------------ to task.

2. Combiner is also known ---------------------------.

3. RecordReader converts byte-oriented view into --------------------------- view.

4. MapReduce sorts the intermediate value based on -------------------------- .

5. In MapReduce Programming, reduce function is applied ---------------- group at a

time.

Big Data and Analytics by Seema Acharya and Subhashini Chellappan

The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
From Everand
The Subtle Art of Not Giving a F*ck: A Counterintuitive Approach to Living a Good Life
Mark Manson
Rating: 4 out of 5 stars
4/5 (5822)
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
From Everand
The Gifts of Imperfection: Let Go of Who You Think You're Supposed to Be and Embrace Who You Are
Brené Brown
Rating: 4 out of 5 stars
4/5 (1093)
Never Split the Difference: Negotiating As If Your Life Depended On It
From Everand
Never Split the Difference: Negotiating As If Your Life Depended On It
Chris Voss
Rating: 4.5 out of 5 stars
4.5/5 (852)
Principles: Life and Work
From Everand
Principles: Life and Work
Ray Dalio
Rating: 4 out of 5 stars
4/5 (612)
The Glass Castle: A Memoir
From Everand
The Glass Castle: A Memoir
Jeannette Walls
Rating: 4.5 out of 5 stars
4.5/5 (1718)
Grit: The Power of Passion and Perseverance
From Everand
Grit: The Power of Passion and Perseverance
Angela Duckworth
Rating: 4 out of 5 stars
4/5 (590)
Sing, Unburied, Sing: A Novel
From Everand
Sing, Unburied, Sing: A Novel
Jesmyn Ward
Rating: 4 out of 5 stars
4/5 (1105)
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
From Everand
Hidden Figures: The American Dream and the Untold Story of the Black Women Mathematicians Who Helped Win the Space Race
Margot Lee Shetterly
Rating: 4 out of 5 stars
4/5 (898)
Shoe Dog: A Memoir by the Creator of Nike
From Everand
Shoe Dog: A Memoir by the Creator of Nike
Phil Knight
Rating: 4.5 out of 5 stars
4.5/5 (540)
The Perks of Being a Wallflower
From Everand
The Perks of Being a Wallflower
Stephen Chbosky
Rating: 4.5 out of 5 stars
4.5/5 (2104)
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
From Everand
The Hard Thing About Hard Things: Building a Business When There Are No Easy Answers
Ben Horowitz
Rating: 4.5 out of 5 stars
4.5/5 (349)
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
From Everand
Elon Musk: Tesla, SpaceX, and the Quest for a Fantastic Future
Ashlee Vance
Rating: 4.5 out of 5 stars
4.5/5 (474)
Bad Feminist: Essays
From Everand
Bad Feminist: Essays
Roxane Gay
Rating: 4 out of 5 stars
4/5 (1025)
The Outsider: A Novel
From Everand
The Outsider: A Novel
Stephen King
Rating: 4 out of 5 stars
4/5 (1867)
Her Body and Other Parties: Stories
From Everand
Her Body and Other Parties: Stories
Carmen Maria Machado
Rating: 4 out of 5 stars
4/5 (823)
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
From Everand
The Sympathizer: A Novel (Pulitzer Prize for Fiction)
Viet Thanh Nguyen
Rating: 4.5 out of 5 stars
4.5/5 (122)
Angela's Ashes: A Memoir
From Everand
Angela's Ashes: A Memoir
Frank McCourt
Rating: 4.5 out of 5 stars
4.5/5 (441)
The Emperor of All Maladies: A Biography of Cancer
From Everand
The Emperor of All Maladies: A Biography of Cancer
Siddhartha Mukherjee
Rating: 4.5 out of 5 stars
4.5/5 (271)
Brooklyn: A Novel
From Everand
Brooklyn: A Novel
Colm Toibin
Rating: 3.5 out of 5 stars
3.5/5 (1948)
The Little Book of Hygge: Danish Secrets to Happy Living
From Everand
The Little Book of Hygge: Danish Secrets to Happy Living
Meik Wiking
Rating: 3.5 out of 5 stars
3.5/5 (403)
A Man Called Ove: A Novel
From Everand
A Man Called Ove: A Novel
Fredrik Backman
Rating: 4.5 out of 5 stars
4.5/5 (4771)
The World Is Flat 3.0: A Brief History of the Twenty-first Century
From Everand
The World Is Flat 3.0: A Brief History of the Twenty-first Century
Thomas L. Friedman
Rating: 3.5 out of 5 stars
3.5/5 (2259)
Steve Jobs
From Everand
Steve Jobs
Walter Isaacson
Rating: 4.5 out of 5 stars
4.5/5 (809)
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
From Everand
Devil in the Grove: Thurgood Marshall, the Groveland Boys, and the Dawn of a New America
Gilbert King
Rating: 4.5 out of 5 stars
4.5/5 (266)
A Tree Grows in Brooklyn
From Everand
A Tree Grows in Brooklyn
Betty Smith
Rating: 4.5 out of 5 stars
4.5/5 (1929)
The Art of Racing in the Rain: A Novel
From Everand
The Art of Racing in the Rain: A Novel
Garth Stein
Rating: 4 out of 5 stars
4/5 (4208)
The Yellow House: A Memoir (2019 National Book Award Winner)
From Everand
The Yellow House: A Memoir (2019 National Book Award Winner)
Sarah M. Broom
Rating: 4 out of 5 stars
4/5 (98)
Yes Please
From Everand
Yes Please
Amy Poehler
Rating: 4 out of 5 stars
4/5 (1903)
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
From Everand
A Heartbreaking Work Of Staggering Genius: A Memoir Based on a True Story
Dave Eggers
Rating: 3.5 out of 5 stars
3.5/5 (231)
Team of Rivals: The Political Genius of Abraham Lincoln
From Everand
Team of Rivals: The Political Genius of Abraham Lincoln
Doris Kearns Goodwin
Rating: 4.5 out of 5 stars
4.5/5 (234)
The Woman in Cabin 10
From Everand
The Woman in Cabin 10
Ruth Ware
Rating: 3.5 out of 5 stars
3.5/5 (2522)
Wolf Hall: A Novel
From Everand
Wolf Hall: A Novel
Hilary Mantel
Rating: 4 out of 5 stars
4/5 (3973)
Fear: Trump in the White House
From Everand
Fear: Trump in the White House
Bob Woodward
Rating: 3.5 out of 5 stars
3.5/5 (738)
John Adams
From Everand
John Adams
David McCullough
Rating: 4.5 out of 5 stars
4.5/5 (2409)
Suraj Trader Money Management
Document8 pages
Suraj Trader Money Management
Nasim Mallick
No ratings yet
On Fire: The (Burning) Case for a Green New Deal
From Everand
On Fire: The (Burning) Case for a Green New Deal
Naomi Klein
Rating: 4 out of 5 stars
4/5 (74)
The Light Between Oceans: A Novel
From Everand
The Light Between Oceans: A Novel
M.L. Stedman
Rating: 4.5 out of 5 stars
4.5/5 (789)
Manhattan Beach: A Novel
From Everand
Manhattan Beach: A Novel
Jennifer Egan
Rating: 3.5 out of 5 stars
3.5/5 (880)
The Constant Gardener: A Novel
From Everand
The Constant Gardener: A Novel
John le Carré
Rating: 3.5 out of 5 stars
3.5/5 (104)
The Unwinding: An Inner History of the New America
From Everand
The Unwinding: An Inner History of the New America
George Packer
Rating: 4 out of 5 stars
4/5 (45)
Rise of ISIS: A Threat We Can't Ignore
From Everand
Rise of ISIS: A Threat We Can't Ignore
Jay Sekulow
Rating: 3.5 out of 5 stars
3.5/5 (137)
Little Women
From Everand
Little Women
Louisa May Alcott
Rating: 4 out of 5 stars
4/5 (105)
Practice Questions - Continuous Distributions
Document19 pages
Practice Questions - Continuous Distributions
Sony Gupta
No ratings yet
Integrated Review
Document120 pages
Integrated Review
Christian
No ratings yet
Worksheet No. 5
Document2 pages
Worksheet No. 5
Xie Zhen Wu
No ratings yet
Chapter 5
Document45 pages
Chapter 5
HEMAMALINI J
No ratings yet
Chapter 6
Document51 pages
Chapter 6
HEMAMALINI J
100% (1)
Chapter 10
Document50 pages
Chapter 10
HEMAMALINI J
No ratings yet
Chapter 7
Document48 pages
Chapter 7
HEMAMALINI J
No ratings yet
Communication Plan
Document5 pages
Communication Plan
Ernani Moveda Florendo
No ratings yet
Overview SRI Shamshiri Et Al May 2018-With-cover-page-V2
Document17 pages
Overview SRI Shamshiri Et Al May 2018-With-cover-page-V2
micowan
No ratings yet
Multiple Choice Questions: Chronic Obstructive Pulmonary Disease and Anaesthesia Anaesthesia For Awake Craniotomy
Document4 pages
Multiple Choice Questions: Chronic Obstructive Pulmonary Disease and Anaesthesia Anaesthesia For Awake Craniotomy
Eliza Amando
No ratings yet
Cse r18
Document175 pages
Cse r18
ADITYA REDDY MEKALA
No ratings yet
A4 Food Spoilage & Preservation
Document15 pages
A4 Food Spoilage & Preservation
Ravi kishore Kuchiya
No ratings yet
Group:4: Delhi-Mumbai Industrial Corridor: India's Road To Prosperity?
Document9 pages
Group:4: Delhi-Mumbai Industrial Corridor: India's Road To Prosperity?
shahin selkar
No ratings yet
Impact of Green Supply Chain Management Practices On Firms ' Performance: An Empirical Study From The Perspective of Pakistan
Document16 pages
Impact of Green Supply Chain Management Practices On Firms ' Performance: An Empirical Study From The Perspective of Pakistan
syed
No ratings yet
Internal Communication, Towards WWF's Brand Engagement
Document10 pages
Internal Communication, Towards WWF's Brand Engagement
Octa Ramayana
0% (1)
Bhide CV
Document3 pages
Bhide CV
torosterud
No ratings yet
Some Topics Can Make You The Manager / Supervisor, Act Like One Just in Case
Document4 pages
Some Topics Can Make You The Manager / Supervisor, Act Like One Just in Case
Princes Jecyvhel De Leon
No ratings yet
CPC 2 2021
Document3 pages
CPC 2 2021
Sarthak Sabbarwal
No ratings yet
Geremek - Poverty, A History - Blackwell 1994 4
Document1 page
Geremek - Poverty, A History - Blackwell 1994 4
Kopija Kopija
No ratings yet
Circus Lion
Document5 pages
Circus Lion
Tiduj Rácsib
100% (1)
3 Internal Quality Audit
Document12 pages
3 Internal Quality Audit
Jeaneth Dela Pena Carnicer
No ratings yet
Wiring Harness Catia
Document4 pages
Wiring Harness Catia
satyanarayana1981
50% (2)
SD
Document8 pages
SD
Radha Madhava Das
No ratings yet
Air Pollution
Document35 pages
Air Pollution
Nur Hazimah
No ratings yet
Biography Hatta Rajasa
Document2 pages
Biography Hatta Rajasa
Evan
No ratings yet
CPM Lab Assignment-02
Document14 pages
CPM Lab Assignment-02
Muhammad Noman
No ratings yet
Bokeh Cheat Sheet Python For Data Science: 3 Renderers & Visual Customizations
Document26 pages
Bokeh Cheat Sheet Python For Data Science: 3 Renderers & Visual Customizations
Ayuro Tech
No ratings yet
Kodak Plus X
Document12 pages
Kodak Plus X
jtmjtmjtm
No ratings yet
S500 Doc - 02 21 04 PROTEC 420
Document1 page
S500 Doc - 02 21 04 PROTEC 420
Ryan Torres
No ratings yet
Mindray BC-2800 - Service Manual
Document108 pages
Mindray BC-2800 - Service Manual
AvilianaK.Bintari
80% (5)
Seminar Report
Document29 pages
Seminar Report
Urja Dhabarde
No ratings yet
Reactsurf 0092 TDS
Document26 pages
Reactsurf 0092 TDS
prchnandagawali
No ratings yet
Punjab Police: Recruitment For The Post of Police Constable in District Police Cadre of Punjab Police - 2023
Document15 pages
Punjab Police: Recruitment For The Post of Police Constable in District Police Cadre of Punjab Police - 2023
ishare digital
No ratings yet