Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 36

[Chen He]

[Holland Computing Center]


[University of Nebraska-Lincoln]

[MapReduce Introduction and


Programming]
Outline
o Origin
o Mechanism
o Applications
o Example-1
o Exercise-1
o Programming Model
o Java
o Streaming
o How to Debug?
o Runtime minotoring
o Log based debug
o Advanced topics
o In heterogeneous environment
o Google’s plan in the future
Origin
• Function-Oriented Programming-LISP
o Map function:
 > (map odd:car '((1 2) (4 5) (7 9)))
 >(t nil t)
o Reduce function:
 >(reduce #'min '(1 2 3 4 5))
 >(min (min (min (min 1 2) 3) 4) 5)
 >1
Mechanism
Divide and Conquer
It is not easy to parallelize what you want…
Fundamental issues Different programming models
Scheduling, data distribution, synchronization, inter- Message Passing Shared Memory
process communication, robustness, fault tolerance,

Architectural issues
Flynn’s taxonomy (SIMD, MIMD, etc.), network
topology, bisection bandwidth, cache coherence, …

Different programming constructs


Mutexes, conditional variables, barriers, …
Common problems masters/slaves, producers/consumers, work queues,. …
Livelock, deadlock, data starvation, priority inversion,
…dining philosophers, sleeping barbers, cigarette
smokers, …

Actually, programmer shoulders the burden of managing


concurrency…
Adopted from https://wiki.umiacs.umd.edu/ccc/images/e/eb/NAACLHLT2009-
MapReduce-slides.pdf
Mechanism
• M-R
Application
• Classify by problem properties
o Computation-Intensive
 High-energy physics analysis(Indiana University), MD
simulation(P&G, SUNY, UNL)
o Data-Intensive
 Google Map, Log analysis(Scientific log: weather,
earthquake, etc.), Documents process, etc.
o Both-Intensive
 MR-blast(Bioinformatics), TeraSort
Example-1 Word Count
• Input
Halloween was confusing. All my life my parents said,
"Never take candy from strangers." And then they
dressed me up and said, "Go beg for it." I didn’t know
what to do! I’d knock on people’s doors and go, "Trick
or treat." "No thank you.”
----Rita Rudner
• Dependency ! Data or Function?

2) D & C ?

3) What is your Map() and What is your Reduce() ?


Exercise-1 Matrix Reverse
• Original:
Mapper Input:
• a11 a12 a13 a14 ... a1n Key: line
• a21 a22 a23 a24 ... a2n Value: line content
• ...
Mapper Output:
• an1 an2 an3 an4 ... ann Key: line
Value: reversed line
• Reverse: content
• ann ann-1 ... an1 Reducer Output:
• a(n-1)n a(n-1)(n-1) ... a(n-1)1 Key: sort key in
• ... reverse order
Value: reversed line
• a1n a1(n-1) ... a11 content

• Dependency
• D&C
• What are your Map() and Reduce()?
MapReduce Programming Model
• Java
• Hadoop MapReduce
• Construct your pseudo-distributed Hadoop
• Analyze your problem and format your data flow in
<key,value> style
• Override the Mapper.class and Reducer.class with
your own class
• Important Parameters:
• mapred.map.tasks
• mapred.reduce.tasks
• mapred.tasktracker.map.tasks.maximum
• mapred.tasktracker.reduce.tasks.maximum
• mapred.child.java.opts
• mapred.task.timeout
MapReduce Programming Model
• Streaming and Pipe
• Not familiar with Java?
• Python or C++ is also OK
• Detail:
http://hadoop.apache.org/
common/docs/r0.17.2/
streaming.html
MapReduce Debug Methods
1. Web Interface
2. Runtimer monitoring
1. AOP for Java-Aspectj
3. Log based debug
Excersse-2

Matrix Multiplication

◦ A11 A12 A13 B11 B12 B13
◦ A= A21 A22 A23 B= B21 B22 B23
◦ A31 A32 A33 B31 B32 B33

◦ C=A*B
◦ Mapper Input
▪ Key: line number Value: A’s line content
◦ Mapper Output
▪ Key: line number Value: C’s line content
◦ Reducer Output
▪ C matrix
MR algorithm design
• Manageing dependencies
o Mappers run in isolation
 You have no idea in what order the mappers run
 You have no idea on what node the mappers run
 You have no idea when each mapper finishes
o Tools for synchronization
 Ability to hold state in reducer across multiple key-
value pairs
 Sorting function for keys
 Partitioner
 Cleverly-constructed data structures
• Write your own Map and Reduce functions
Advanced Topics
Variants of MapReduce (In heterogeneous Environments)
1) GPGPU MapReduce
CUDA + C
Hadoop + JCUDA
2) MapReduce on MultiCore
3) MapReduce In Open Environment
4) Pydoop=Hadoop written in Python
What Google will do in the future?
MapReduce target is large scale data processing. Why? For
small process, it will introduce very heavy overhead. Google’s next
Generation solution will be called Percolator. We will see the detail
in USENIX Symposium next month.
Advanced Topics
MR-extension: Scheduling
• HDFS Balancer
o Threshold for datanode
 Focus on full datanodes and new added datanodes
 Can not be very small---why?
o Utilization balance
 Highly  poorly
 No more than 10G or threshold fraction of sender and receiver’s
capacity
o Exists condition
 Cluster balanced
 No block can be moved
 No block has been moved for 5 consecutive iterations
 IOException occurs which communicating namenode
 Another copy is running
o Side-effect
 Bandwidth occupation
 Solution: limit the balancer’s use of bandwidth by:
 dfs.balance.bandwidthPerSec
MR-extension: Scheduling
• Job Scheduling
o Capacity Scheduler-Yahoo!
o Fair Sharing Scheduler- Facebook
o Delay Scheduling and copy-compute splitting
MR-extension: Scheduling
• Capacity Scheduler
o 1.A number of Queues, each of them has
configurable number of resource, Resource could be
shared with other queues;
o 2.Within a queue, it is FCFS;
o 3.Configurable waiting time for preemption;
o 4.Support Priority and Memory intensive jobs(need
announcement)
• Fair Scheduler
o 1. Fair;
o 2.Pools, guaranteed capacity, resource could be
shared among pools.
o 3. preemptive, easy for administrator.
Delay Scheduling
• Inducement
o Data Locality
 Node locality, rack locality
 One month observation for job of difference sizes
funning in production at Facebook:
 Jobs with 1 to 25 maps, 5% node locality and 59%
rack locality
 Because small jobs are common for ad-hoc queries
and hourly reports in a data warehouse
 Small jobs arrive and incline to be scheduled
next(according to fair-sharing, or other schedulers);
 The probability of node locality and rack locality of
small jobs is low
Delay Scheduling
• Pseudocode
When a node request task, it will
skip the small job and look at
subsequent jobs. For avoiding
starvation. Two wait time T1 and
T2 has been added into this
scheduling method
T1: jobs wait before being allowed
to launch non-local tasks on same
rack
T2: after waiting for T1, extra
more time before being allowed to
launch off-rack
Copy-compute splitting
• Inducement
o Reduce Slot Hoarding Problem
 Hadoop launch reduce after a few mappers finish, so that map and
reduce can interleaving execution
 Large job with tens of thousands of map tasks, the Map stage will take
very long time, and the job will hold nearly any reduce slots it receives
during this until its maps finish.
 Cause starvation of jobs submitted later.
Copy-compute splitting
• Solution
o Combiner
 Reduce map result on map stage side in memory
o Copy-Compute Splitting
 Split reduce tasks into copy tasks and compute tasks
 Compute Phase Admission Comtrol : let the tasktracker limits
the number of reducers computing for these
resources(maxComputing), and limits the number of copy-
phase reducers each job(maxReducer), then let other jobs use
idle slots
Side-Effect
Still starvation if maxReducer are occupied by long jobs:
preemption timeout.
Memory for merging map output in each reducer needs to be
smaller, so that the copy phases do not interfere with
computing phases
MR-extension: Energy Efficient
• Material and Method
o Six months Hadoop MR logs data
o Product trace analysis
 Inter-Job Arrival Time, Data Size, Data Ratios, Per-Job Input Sizes
o Statistical workload replay
 Re-generate workload use MR program according to previous logs
 MR job to preserve data ratios
 Ignore compute (WHY!!!???)
 Wordcount and PI

 Workload generator inputs include job computation semantics


MR-extension: Energy Efficient
• Cluster Setup

• Results
MR-extension: Energy Efficient
• Results

• Conclusion
o Queue jobs at the launch time is better than launch N at the same time;
o For batched execution, a staggered fashion is better than let all job in the queue;
o Larger block size than 64MB results faster finishing time, if a cluster is highly utilized;
o More task trackers per node induce to faster finishing time and lower energy consumption;
o Larger cluster incline to have better energy-efficiency based on Brand Electronics Model
21-1850/CI power meter
MR-extension: Secure MR
• Close environment vs. Open environment
o OSG accepts HDFS
o Hadoop has patch for Condor(Hadoop-428)
MR-extension: Cascading and
ChainMR
• Break the barrier
• Cascading and ChainMR
MR-extension: HOD
• Basic HOD
o Torque-PBS
o HDFS, TTs, JT could be user-private
• HOD on Prairiefire
o SGE
o HDFS is global, JT and TT are user-private
• In the future
MR-extension: Secure MR
• How to avoid malicious mappers

Adopted from NCSU SecureMR: A service Integrity Assurance Framework for MR


MR-extentions: Multi-Core Cluster
Could you analysis this system’s property and what
MR mechanism should be used?
Optional Homework
• Single Source Shortest Path
o Serial: Dijkstra’s Algorithm
o MR: ???
 Simplify the question: equal edge weight
 Use adjacent list
Exercise-3
• BFS
o DistanceTo(startNode)=0
o For all nodes n directly reachable from startNode, DistanceTo(n)=1
o For all nodes n reachable from some other set of nodes S,
DistanceTo(n)=1+min(DIstanceTo(m), m belongs to S)
• A map task receives
o Key: node n
o Value: D(distance from start), points-to(list of nodes reachable from n)
• A map output
o Any p belongs to points-to: emit(p, D+1)
• Reducer will gathers possible distance to a given p and selects the
minimum one
• Multiple Iterations Needed
o MR task advances the “know frontier” by one hop
 Subsequent iterations include more reachable nodes as frontier
advances
 Multiple iteration are needed to explore entire graph
 Feed output back into the same MapReduce task
Preserving graph structure
Mapper emits(n, points-to) as well
Graph Algorithms in MR
• General approach:
o Store graphs as adjacency lists
o Each map task receives a node and its outlinks
(adjacency list)
o Map task compute some function of the link
structure, emits value with target as the key
o Reduce task collects keys(target nodes) and
aggregates
• Iterate multiple MR cycles until some termination
condition
o Remember to “pass” graph structure from one
iteration to next

You might also like