Welcome to Scribd!

0% found this document useful (0 votes)

10 views

25 - Map Reduce Recap

Uploaded by

This document discusses algorithms for information retrieval and distributed indexing. It discusses challenges of working with large datasets like 10 billion web pages totaling 200 TB. It discusses how MapReduce and HDFS address issues of persistence, availability and data locality by storing data across multiple nodes and moving computation to the data. HDFS uses chunk servers to store and replicate file chunks, with a master node storing metadata. MapReduce execution involves mapping data to intermediate key-value pairs, partitioning, shuffling to reducers, and reducing to output results. Partitioners and combiners can improve efficiency.

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Agile Key With Answers-Consolidated
Document12 pages
Agile Key With Answers-Consolidated
Kasetti Lahari
70% (404)
How To Configure T24 Browser With Jboss 6 EAP Version
Document12 pages
How To Configure T24 Browser With Jboss 6 EAP Version
Tuấn Nghiêm
No ratings yet
Big Data & Hadoop Training Material 0 1 PDF
Document168 pages
Big Data & Hadoop Training Material 0 1 PDF
haranadh
50% (2)
Scaling To 200K Transactions Per Second With Open Source - MySQL, Java, Curl, PHP
Document37 pages
Scaling To 200K Transactions Per Second With Open Source - MySQL, Java, Curl, PHP
Dathan Vance Pattishall
No ratings yet
Introduction To DelphiMVCFramework
Document27 pages
Introduction To DelphiMVCFramework
Vlad
No ratings yet
TM2 ch02 Mapreduce
Document51 pages
TM2 ch02 Mapreduce
tzinajojo
No ratings yet
Week 02
Document115 pages
Week 02
muhammad shoaib
No ratings yet
Google File System
Document22 pages
Google File System
Rakesh Akhileswaran
No ratings yet
Cluster Basics
Document34 pages
Cluster Basics
Jay Nagwani
No ratings yet
BSC in Information Technology: Massive or Big Data Processing J.Alosius
Document17 pages
BSC in Information Technology: Massive or Big Data Processing J.Alosius
Geethma Minoli
No ratings yet
BDP 2023 03
Document59 pages
BDP 2023 03
Aayush Airan
No ratings yet
Implementation of DSP Algorithms
Document20 pages
Implementation of DSP Algorithms
s tharun
No ratings yet
Oracle RAC Performance Management
Document35 pages
Oracle RAC Performance Management
Krishna8765
No ratings yet
HadoopMapreduce Summerization
Document24 pages
HadoopMapreduce Summerization
Atharv Chaudhari
No ratings yet
Computer Architecture: Code: CT173
Document16 pages
Computer Architecture: Code: CT173
Annowit Richard
No ratings yet
Hadoop Architecture
Document48 pages
Hadoop Architecture
vidya56789
No ratings yet
Mining of Massive Datasets: Leskovec, Rajaraman, and Ullman
Document11 pages
Mining of Massive Datasets: Leskovec, Rajaraman, and Ullman
oscarmyo
No ratings yet
3 Hadoop
Document111 pages
3 Hadoop
zohaib
No ratings yet
Impala Presentation - Orlando PDF
Document60 pages
Impala Presentation - Orlando PDF
VinkeyChen
No ratings yet
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
Document34 pages
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
amit bhalla
No ratings yet
Mysql Cluster Deployment Best Practices
Document39 pages
Mysql Cluster Deployment Best Practices
sushil pun
No ratings yet
Hadoop Important Lecture
Document38 pages
Hadoop Important Lecture
affanabbasi015
No ratings yet
MySQL Cluster - Deployment Best Practices Presentation
Document31 pages
MySQL Cluster - Deployment Best Practices Presentation
Omchavarium Mangundiharja
No ratings yet
Lecture4 IntroMapReduce PDF
Document75 pages
Lecture4 IntroMapReduce PDF
Ronald Bbosa
No ratings yet
2 Hadoop Ecosystem
Document41 pages
2 Hadoop Ecosystem
tranngocbaooooo12062003
No ratings yet
Extreme Performance
Document45 pages
Extreme Performance
abcatxy
No ratings yet
Introduction To Map Reduce
Document50 pages
Introduction To Map Reduce
KhAn Zainab
No ratings yet
CassandraTraining v3.3.4
Document183 pages
CassandraTraining v3.3.4
Ratika Miglani Malhotra
100% (1)
DB2 11 Overview
Document26 pages
DB2 11 Overview
Victor Trujillo
No ratings yet
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
Document27 pages
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
Car
No ratings yet
Algorithms For Information Retrieval: Index Construction
Document13 pages
Algorithms For Information Retrieval: Index Construction
gpswain
No ratings yet
SEN-762 Advanced Big Data Analytics
Document39 pages
SEN-762 Advanced Big Data Analytics
بالیراجپوت
No ratings yet
Hadoop Ecosystem
Document58 pages
Hadoop Ecosystem
pechaporn
No ratings yet
Lecture 24: WSC, Datacenters
Document19 pages
Lecture 24: WSC, Datacenters
Munni
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
Document46 pages
Overview of Parallel Computing: Shawn T. Brown
Karthik Kusuma
No ratings yet
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
Document36 pages
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
freakedvicky
No ratings yet
ADBMS Parallel and Distributed Databases
Document98 pages
ADBMS Parallel and Distributed Databases
Tulip
No ratings yet
Introduction To DBMS
Document37 pages
Introduction To DBMS
KIRUTHIKA K
No ratings yet
Tuning Linux For Big Firebird Database: 693GB AND 1000+ USERS
Document35 pages
Tuning Linux For Big Firebird Database: 693GB AND 1000+ USERS
Oscar Martinez
No ratings yet
Datacenters (Optional Fun)
Document50 pages
Datacenters (Optional Fun)
Kim Lien Trinh
No ratings yet
Enkitec-DIY Exadata KerryOsborne
Document26 pages
Enkitec-DIY Exadata KerryOsborne
tssr2001
No ratings yet
BDL8 PDF
Document41 pages
BDL8 PDF
Mrs. Usha Naidu S
No ratings yet
The Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage Platforms
Document22 pages
The Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage Platforms
harindramehta
No ratings yet
Chapter4 PDF
Document50 pages
Chapter4 PDF
Vard Farrell
No ratings yet
Unit 1 Haoop Architecture
Document26 pages
Unit 1 Haoop Architecture
Anirudh Prakash
No ratings yet
Firebird Tuning
Document60 pages
Firebird Tuning
Hendri Arifin
No ratings yet
Chapter 6
Document37 pages
Chapter 6
shubhamvslavi
No ratings yet
Hadoopintro
Document31 pages
Hadoopintro
Dhanraj Kannan
No ratings yet
CS 525 Advanced Distributed Systems Spring 2010: Ravenshaw Management Centre, Cuttack
Document27 pages
CS 525 Advanced Distributed Systems Spring 2010: Ravenshaw Management Centre, Cuttack
Raj Maana
No ratings yet
Tim Hawkins: or "How To Survive The Digg or Slashdot Effect"
Document34 pages
Tim Hawkins: or "How To Survive The Digg or Slashdot Effect"
Robert Guiscard
100% (10)
6.1 GCP - Cloud - Bigtable PDF
Document18 pages
6.1 GCP - Cloud - Bigtable PDF
gauravecec1980
No ratings yet
BDA All Modules
Document72 pages
BDA All Modules
v h
No ratings yet
Unit 4
Document32 pages
Unit 4
lata
No ratings yet
19 JobSchedulers
Document37 pages
19 JobSchedulers
sayo3712
No ratings yet
Untitled
Document58 pages
Untitled
Mirco Ghizzoni
No ratings yet
Oracle Net Services
Document56 pages
Oracle Net Services
Kannan Saravanan
No ratings yet
The Earth Simulator: Presented by Jin Soon Lim For CS 566
Document29 pages
The Earth Simulator: Presented by Jin Soon Lim For CS 566
swapnanshah
No ratings yet
04 Oracle Big Data Appliance Deep Dive
Document49 pages
04 Oracle Big Data Appliance Deep Dive
Shiva Roka
No ratings yet
Hadoop and Related Tools
Document57 pages
Hadoop and Related Tools
Prem Prasad
No ratings yet
Fast Python: High performance techniques for large datasets
From Everand
Fast Python: High performance techniques for large datasets
Tiago Antao
No ratings yet
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
All My IT Tech Posts
From Everand
All My IT Tech Posts
Stephen Edwards
No ratings yet
Algorithms For Information Retrieval: Index Construction
Document13 pages
Algorithms For Information Retrieval: Index Construction
gpswain
No ratings yet
Algorithms For Information Retrieval: Index Construction
Document12 pages
Algorithms For Information Retrieval: Index Construction
gpswain
No ratings yet
23 - BSBI - Part 2
Document13 pages
23 - BSBI - Part 2
gpswain
No ratings yet
Algorithms For Information Retrieval: Index Construction
Document12 pages
Algorithms For Information Retrieval: Index Construction
gpswain
No ratings yet
Installation Instructions For APCu
Document7 pages
Installation Instructions For APCu
Jalaska
No ratings yet
E Banking System
Document3 pages
E Banking System
pradeep599
No ratings yet
Vodafone USB Classic (K3773) Upgrade Guide V4.6
Document7 pages
Vodafone USB Classic (K3773) Upgrade Guide V4.6
Bobby Chipping
No ratings yet
Printing or Downloading Service For Object Atta..
Document7 pages
Printing or Downloading Service For Object Atta..
Anonymous zzw4hoFvHN
No ratings yet
Convert Your Web Page To Plain Text
Document4 pages
Convert Your Web Page To Plain Text
Ramon Diaz
No ratings yet
Iframes: Iframe - Set Height and Width
Document2 pages
Iframes: Iframe - Set Height and Width
Mihai George Forlafu
No ratings yet
China Product Hacking Report
Document8 pages
China Product Hacking Report
noelheng
No ratings yet
Upgrading ODI 11g To 12c
Document50 pages
Upgrading ODI 11g To 12c
Usha Ramanathan
No ratings yet
Carbide Ui S60 Theme 3 1 Data Sheet
Document2 pages
Carbide Ui S60 Theme 3 1 Data Sheet
Riska Natalia
No ratings yet
FastStone Image Viewer
Document81 pages
FastStone Image Viewer
Carlos Rodero
No ratings yet
Boot Grub
Document31 pages
Boot Grub
Siddharth Nawani
No ratings yet
IP Filter Block With Utorrent
Document2 pages
IP Filter Block With Utorrent
Radim Radove
No ratings yet
Microsoft Azure Estimate: Budget Saudi Arabia - Azure Boq
Document6 pages
Microsoft Azure Estimate: Budget Saudi Arabia - Azure Boq
Suhail Karbhari
No ratings yet
Ashara Dragon Age Inquisitors
Document3 pages
Ashara Dragon Age Inquisitors
Jay R MV
No ratings yet
Enrolment System
Document22 pages
Enrolment System
Niar Jay Emboltura
No ratings yet
Import ADHD VM in Proxmox
Document9 pages
Import ADHD VM in Proxmox
Arijit Sarkar
No ratings yet
Ijert Ijert: Isizoh A. N. Anazia A.E
Document5 pages
Ijert Ijert: Isizoh A. N. Anazia A.E
shaanthemonster
No ratings yet
DD Vsto ret20MSI62E3
Document82 pages
DD Vsto ret20MSI62E3
Ankur Sachan
No ratings yet
README
Document2 pages
README
AxelEruM
No ratings yet
Css (BScCSIT 5th Semester)
Document67 pages
Css (BScCSIT 5th Semester)
Rishav Malla Thakuri
No ratings yet
Process Details Output
Document240 pages
Process Details Output
Γεώργιος Νάστας
No ratings yet
Splitup of Sem1
Document8 pages
Splitup of Sem1
JyotiKalsaria
50% (2)
DevInfo 7 0 - User Guide en - r2
Document38 pages
DevInfo 7 0 - User Guide en - r2
Pnicolzuniga
No ratings yet
Combster - New 2
Document16 pages
Combster - New 2
Benjamin
No ratings yet
Elance - Java Developers 2
Document15 pages
Elance - Java Developers 2
Thomas Scharrer
No ratings yet
Debug Log1
Document388 pages
Debug Log1
Fransua Reble
No ratings yet
HTMLQuestionAnd Answer
Document17 pages
HTMLQuestionAnd Answer
Renuka
0% (1)

25 - Map Reduce Recap

Uploaded by

gpswain

0% found this document useful (0 votes)

10 views12 pages

Original Description:

Original Title

25_map Reduce Recap

Copyright

Available Formats

PDF, TXT or read online from Scribd

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Report this Document

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

0% found this document useful (0 votes)

10 views12 pages

25 - Map Reduce Recap

Uploaded by

gpswain

Copyright:

Available Formats

Download as PDF, TXT or read online from Scribd

Flag for inappropriate content

Download as pdf or txt

Jump to Page

You are on page 1of 12

Search inside document

ALGORITHMS FOR INFORMATION

RETRIEVAL
Index Construction

Bhaskarjyoti Das
Department of Computer Science Engineering
ALGORITHMS FOR INFORMATION
RETRIEVAL

Distributed Indexing – MAP REDUCE

and HDFS Recap

Bhaskarjyoti Das
Department of Computer Science Engineering
ALGORITHMS FOR INFORMATION RETRIEVAL
Working with Large Data

Google example (hypotheical)

• 10 Billion web pages
• Average size of web page = 20 KB
• Total 10 B *20 KB = 200 TB
• Disk I/O channel bandwidth = 50 MB/sec
• Total time to read 200 TB = 4 Mill sec = 47 days !!
ALGORITHMS FOR INFORMATION RETRIEVAL
Computation Failure

▪As of 2011 news report, Google used around 9 Million servers

for search indexing

▪Single server can stay up for around 1000 days

▪If a cluster has 1000 servers, 1 failure/day

▪For 1 M servers, 1000 failures /day

▪What is the fault-tolerance strategy ?

ALGORITHMS FOR INFORMATION RETRIEVAL
Network Bandwidth is a Bottleneck

• Typically switch provides 1 GBPS

• Moving 10 TB will take around a day !

ALGORITHMS FOR INFORMATION RETRIEVAL
MAP REDUCE and HDFS

• Store data persistently on multiple nodes to address

PERSISTENCE and AVAILABILITY

• Move computation to data (traditional: data moves to

computation) to minimize data movement and address data
locality issue

• Simple programming model to hide all these complexity. This

programming model is designed to minimize data movement
across the switches.
ALGORITHMS FOR INFORMATION RETRIEVAL
Redundant Storage Infrastructure

• Google GFS , Hadoop HDFS addressing availability and

persistence
• Data mostly read and appended , rarely updated
• Chunk Servers ensure availability
• File is split into contiguous chunks (16-64 MB)
• Each Chunk is replicated (2 x or 3x)
• Even try to keep the replicas in different racks
• Master node / Name node in HDFS to conduct the orchestra
• Stores metadata about where files are stored
• May be replicated
• Client library to access data
• Talk to master to get meta data (where are the chunks)
• Then talks directly to chunk server to get the data
ALGORITHMS FOR INFORMATION RETRIEVAL
Redundant Storage Infrastructure

HDFS Master
Client

C0 C1 C1 C0 C5

C5 C2 C5 C3 … C2

Chunkserver 1 Chunkserver 2 Chunkserver N

• Provides a platform over which other systems like MapReduce

operate
ALGORITHMS FOR INFORMATION RETRIEVAL
MAP REDUCE EXECUTION OVERVIEW
ALGORITHMS FOR INFORMATION RETRIEVAL
Partitioner and Combiner in MAP REDUCE

• The partitioning phase takes place after the map phase and
before the reduce phase.
• The number of partitions is equal to the number of
reducers.
• The data gets partitioned across the reducers according
to the partitioning function .
• Often a MAP task will produce many pairs of the form
(k1,v1), (k2, v2) ..for same key K
• Can save network time by pre-aggregating the values in
the mapper..combiner is usually same as Reducer function
• Benefit : much less data need to be copied and shuffled
ALGORITHMS FOR INFORMATION RETRIEVAL
MAP REDUCE Conclusion

• Provides a general-purpose model to simplify large-

scale computation
• Allows users to focus on the problem without
worrying about nuts and bolts of details
▪ MapReduce breaks a large computing problem into
smaller parts by recasting it in terms of manipulation
of key-value pairs
▪ Master machine assigns each task to an idle machine
from a pool.
THANK YOU

Bhaskarjyoti Das
Department of Computer Science Engineering

Agile Key With Answers-Consolidated
Document12 pages
Agile Key With Answers-Consolidated
Kasetti Lahari
70% (404)
How To Configure T24 Browser With Jboss 6 EAP Version
Document12 pages
How To Configure T24 Browser With Jboss 6 EAP Version
Tuấn Nghiêm
No ratings yet
Big Data & Hadoop Training Material 0 1 PDF
Document168 pages
Big Data & Hadoop Training Material 0 1 PDF
haranadh
50% (2)
Scaling To 200K Transactions Per Second With Open Source - MySQL, Java, Curl, PHP
Document37 pages
Scaling To 200K Transactions Per Second With Open Source - MySQL, Java, Curl, PHP
Dathan Vance Pattishall
No ratings yet
Introduction To DelphiMVCFramework
Document27 pages
Introduction To DelphiMVCFramework
Vlad
No ratings yet
TM2 ch02 Mapreduce
Document51 pages
TM2 ch02 Mapreduce
tzinajojo
No ratings yet
Week 02
Document115 pages
Week 02
muhammad shoaib
No ratings yet
Google File System
Document22 pages
Google File System
Rakesh Akhileswaran
No ratings yet
Cluster Basics
Document34 pages
Cluster Basics
Jay Nagwani
No ratings yet
BSC in Information Technology: Massive or Big Data Processing J.Alosius
Document17 pages
BSC in Information Technology: Massive or Big Data Processing J.Alosius
Geethma Minoli
No ratings yet
BDP 2023 03
Document59 pages
BDP 2023 03
Aayush Airan
No ratings yet
Implementation of DSP Algorithms
Document20 pages
Implementation of DSP Algorithms
s tharun
No ratings yet
Oracle RAC Performance Management
Document35 pages
Oracle RAC Performance Management
Krishna8765
No ratings yet
HadoopMapreduce Summerization
Document24 pages
HadoopMapreduce Summerization
Atharv Chaudhari
No ratings yet
Computer Architecture: Code: CT173
Document16 pages
Computer Architecture: Code: CT173
Annowit Richard
No ratings yet
Hadoop Architecture
Document48 pages
Hadoop Architecture
vidya56789
No ratings yet
Mining of Massive Datasets: Leskovec, Rajaraman, and Ullman
Document11 pages
Mining of Massive Datasets: Leskovec, Rajaraman, and Ullman
oscarmyo
No ratings yet
3 Hadoop
Document111 pages
3 Hadoop
zohaib
No ratings yet
Impala Presentation - Orlando PDF
Document60 pages
Impala Presentation - Orlando PDF
VinkeyChen
No ratings yet
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
Document34 pages
Introduction To Hadoop: Dr. G Sudha Sadhasivam Professor, CSE PSG College of Technology Coimbatore
amit bhalla
No ratings yet
Mysql Cluster Deployment Best Practices
Document39 pages
Mysql Cluster Deployment Best Practices
sushil pun
No ratings yet
Hadoop Important Lecture
Document38 pages
Hadoop Important Lecture
affanabbasi015
No ratings yet
MySQL Cluster - Deployment Best Practices Presentation
Document31 pages
MySQL Cluster - Deployment Best Practices Presentation
Omchavarium Mangundiharja
No ratings yet
Lecture4 IntroMapReduce PDF
Document75 pages
Lecture4 IntroMapReduce PDF
Ronald Bbosa
No ratings yet
2 Hadoop Ecosystem
Document41 pages
2 Hadoop Ecosystem
tranngocbaooooo12062003
No ratings yet
Extreme Performance
Document45 pages
Extreme Performance
abcatxy
No ratings yet
Introduction To Map Reduce
Document50 pages
Introduction To Map Reduce
KhAn Zainab
No ratings yet
CassandraTraining v3.3.4
Document183 pages
CassandraTraining v3.3.4
Ratika Miglani Malhotra
100% (1)
DB2 11 Overview
Document26 pages
DB2 11 Overview
Victor Trujillo
No ratings yet
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
Document27 pages
Parallel Database: Architecture For Parallel Databases. Parallel Query Evaluation Parallelizing Individual Operations
Car
No ratings yet
Algorithms For Information Retrieval: Index Construction
Document13 pages
Algorithms For Information Retrieval: Index Construction
gpswain
No ratings yet
SEN-762 Advanced Big Data Analytics
Document39 pages
SEN-762 Advanced Big Data Analytics
بالیراجپوت
No ratings yet
Hadoop Ecosystem
Document58 pages
Hadoop Ecosystem
pechaporn
No ratings yet
Lecture 24: WSC, Datacenters
Document19 pages
Lecture 24: WSC, Datacenters
Munni
No ratings yet
Overview of Parallel Computing: Shawn T. Brown
Document46 pages
Overview of Parallel Computing: Shawn T. Brown
Karthik Kusuma
No ratings yet
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
Document36 pages
Welcome To The New Era of Cloud Computing: The Web Is Replacing The Desktop
freakedvicky
No ratings yet
ADBMS Parallel and Distributed Databases
Document98 pages
ADBMS Parallel and Distributed Databases
Tulip
No ratings yet
Introduction To DBMS
Document37 pages
Introduction To DBMS
KIRUTHIKA K
No ratings yet
Tuning Linux For Big Firebird Database: 693GB AND 1000+ USERS
Document35 pages
Tuning Linux For Big Firebird Database: 693GB AND 1000+ USERS
Oscar Martinez
No ratings yet
Datacenters (Optional Fun)
Document50 pages
Datacenters (Optional Fun)
Kim Lien Trinh
No ratings yet
Enkitec-DIY Exadata KerryOsborne
Document26 pages
Enkitec-DIY Exadata KerryOsborne
tssr2001
No ratings yet
BDL8 PDF
Document41 pages
BDL8 PDF
Mrs. Usha Naidu S
No ratings yet
The Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage Platforms
Document22 pages
The Pros and Cons of Erasure Coding & Replication vs. RAID in Next-Gen Storage Platforms
harindramehta
No ratings yet
Chapter4 PDF
Document50 pages
Chapter4 PDF
Vard Farrell
No ratings yet
Unit 1 Haoop Architecture
Document26 pages
Unit 1 Haoop Architecture
Anirudh Prakash
No ratings yet
Firebird Tuning
Document60 pages
Firebird Tuning
Hendri Arifin
No ratings yet
Chapter 6
Document37 pages
Chapter 6
shubhamvslavi
No ratings yet
Hadoopintro
Document31 pages
Hadoopintro
Dhanraj Kannan
No ratings yet
CS 525 Advanced Distributed Systems Spring 2010: Ravenshaw Management Centre, Cuttack
Document27 pages
CS 525 Advanced Distributed Systems Spring 2010: Ravenshaw Management Centre, Cuttack
Raj Maana
No ratings yet
Tim Hawkins: or "How To Survive The Digg or Slashdot Effect"
Document34 pages
Tim Hawkins: or "How To Survive The Digg or Slashdot Effect"
Robert Guiscard
100% (10)
6.1 GCP - Cloud - Bigtable PDF
Document18 pages
6.1 GCP - Cloud - Bigtable PDF
gauravecec1980
No ratings yet
BDA All Modules
Document72 pages
BDA All Modules
v h
No ratings yet
Unit 4
Document32 pages
Unit 4
lata
No ratings yet
19 JobSchedulers
Document37 pages
19 JobSchedulers
sayo3712
No ratings yet
Untitled
Document58 pages
Untitled
Mirco Ghizzoni
No ratings yet
Oracle Net Services
Document56 pages
Oracle Net Services
Kannan Saravanan
No ratings yet
The Earth Simulator: Presented by Jin Soon Lim For CS 566
Document29 pages
The Earth Simulator: Presented by Jin Soon Lim For CS 566
swapnanshah
No ratings yet
04 Oracle Big Data Appliance Deep Dive
Document49 pages
04 Oracle Big Data Appliance Deep Dive
Shiva Roka
No ratings yet
Hadoop and Related Tools
Document57 pages
Hadoop and Related Tools
Prem Prasad
No ratings yet
Fast Python: High performance techniques for large datasets
From Everand
Fast Python: High performance techniques for large datasets
Tiago Antao
No ratings yet
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
From Everand
Build Your Own Distributed Compilation Cluster - A Practical Walkthrough
Hunter Davis
No ratings yet
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
From Everand
Nintendo 64 Architecture: Architecture of Consoles: A Practical Analysis, #8
Rodrigo Copetti
No ratings yet
All My IT Tech Posts
From Everand
All My IT Tech Posts
Stephen Edwards
No ratings yet
Algorithms For Information Retrieval: Index Construction
Document13 pages
Algorithms For Information Retrieval: Index Construction
gpswain
No ratings yet
Algorithms For Information Retrieval: Index Construction
Document12 pages
Algorithms For Information Retrieval: Index Construction
gpswain
No ratings yet
23 - BSBI - Part 2
Document13 pages
23 - BSBI - Part 2
gpswain
No ratings yet
Algorithms For Information Retrieval: Index Construction
Document12 pages
Algorithms For Information Retrieval: Index Construction
gpswain
No ratings yet
Installation Instructions For APCu
Document7 pages
Installation Instructions For APCu
Jalaska
No ratings yet
E Banking System
Document3 pages
E Banking System
pradeep599
No ratings yet
Vodafone USB Classic (K3773) Upgrade Guide V4.6
Document7 pages
Vodafone USB Classic (K3773) Upgrade Guide V4.6
Bobby Chipping
No ratings yet
Printing or Downloading Service For Object Atta..
Document7 pages
Printing or Downloading Service For Object Atta..
Anonymous zzw4hoFvHN
No ratings yet
Convert Your Web Page To Plain Text
Document4 pages
Convert Your Web Page To Plain Text
Ramon Diaz
No ratings yet
Iframes: Iframe - Set Height and Width
Document2 pages
Iframes: Iframe - Set Height and Width
Mihai George Forlafu
No ratings yet
China Product Hacking Report
Document8 pages
China Product Hacking Report
noelheng
No ratings yet
Upgrading ODI 11g To 12c
Document50 pages
Upgrading ODI 11g To 12c
Usha Ramanathan
No ratings yet
Carbide Ui S60 Theme 3 1 Data Sheet
Document2 pages
Carbide Ui S60 Theme 3 1 Data Sheet
Riska Natalia
No ratings yet
FastStone Image Viewer
Document81 pages
FastStone Image Viewer
Carlos Rodero
No ratings yet
Boot Grub
Document31 pages
Boot Grub
Siddharth Nawani
No ratings yet
IP Filter Block With Utorrent
Document2 pages
IP Filter Block With Utorrent
Radim Radove
No ratings yet
Microsoft Azure Estimate: Budget Saudi Arabia - Azure Boq
Document6 pages
Microsoft Azure Estimate: Budget Saudi Arabia - Azure Boq
Suhail Karbhari
No ratings yet
Ashara Dragon Age Inquisitors
Document3 pages
Ashara Dragon Age Inquisitors
Jay R MV
No ratings yet
Enrolment System
Document22 pages
Enrolment System
Niar Jay Emboltura
No ratings yet
Import ADHD VM in Proxmox
Document9 pages
Import ADHD VM in Proxmox
Arijit Sarkar
No ratings yet
Ijert Ijert: Isizoh A. N. Anazia A.E
Document5 pages
Ijert Ijert: Isizoh A. N. Anazia A.E
shaanthemonster
No ratings yet
DD Vsto ret20MSI62E3
Document82 pages
DD Vsto ret20MSI62E3
Ankur Sachan
No ratings yet
README
Document2 pages
README
AxelEruM
No ratings yet
Css (BScCSIT 5th Semester)
Document67 pages
Css (BScCSIT 5th Semester)
Rishav Malla Thakuri
No ratings yet
Process Details Output
Document240 pages
Process Details Output
Γεώργιος Νάστας
No ratings yet
Splitup of Sem1
Document8 pages
Splitup of Sem1
JyotiKalsaria
50% (2)
DevInfo 7 0 - User Guide en - r2
Document38 pages
DevInfo 7 0 - User Guide en - r2
Pnicolzuniga
No ratings yet
Combster - New 2
Document16 pages
Combster - New 2
Benjamin
No ratings yet
Elance - Java Developers 2
Document15 pages
Elance - Java Developers 2
Thomas Scharrer
No ratings yet
Debug Log1
Document388 pages
Debug Log1
Fransua Reble
No ratings yet
HTMLQuestionAnd Answer
Document17 pages
HTMLQuestionAnd Answer
Renuka
0% (1)

25 - Map Reduce Recap

Uploaded by

Copyright:

Available Formats

You might also like

25 - Map Reduce Recap

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

25 - Map Reduce Recap

Uploaded by

Copyright:

Available Formats

ALGORITHMS FOR INFORMATION

Distributed Indexing – MAP REDUCE

Google example (hypotheical)

▪As of 2011 news report, Google used around 9 Million servers

▪Single server can stay up for around 1000 days

▪If a cluster has 1000 servers, 1 failure/day

▪For 1 M servers, 1000 failures /day

▪What is the fault-tolerance strategy ?

• Typically switch provides 1 GBPS

• Moving 10 TB will take around a day !

• Store data persistently on multiple nodes to address

• Move computation to data (traditional: data moves to

• Simple programming model to hide all these complexity. This

• Google GFS , Hadoop HDFS addressing availability and

Chunkserver 1 Chunkserver 2 Chunkserver N

• Provides a platform over which other systems like MapReduce

• Provides a general-purpose model to simplify large-

You might also like