Professional Documents
Culture Documents
MapReduce Its Applications For Course
MapReduce Its Applications For Course
Architectural issues
Flynn’s taxonomy (SIMD, MIMD, etc.), network
topology, bisection bandwidth, cache coherence, …
2) D & C ?
• Dependency
• D&C
• What are your Map() and Reduce()?
MapReduce Programming Model
• Java
• Hadoop MapReduce
• Construct your pseudo-distributed Hadoop
• Analyze your problem and format your data flow in
<key,value> style
• Override the Mapper.class and Reducer.class with
your own class
• Important Parameters:
• mapred.map.tasks
• mapred.reduce.tasks
• mapred.tasktracker.map.tasks.maximum
• mapred.tasktracker.reduce.tasks.maximum
• mapred.child.java.opts
• mapred.task.timeout
MapReduce Programming Model
• Streaming and Pipe
• Not familiar with Java?
• Python or C++ is also OK
• Detail:
http://hadoop.apache.org/
common/docs/r0.17.2/
streaming.html
MapReduce Debug Methods
1. Web Interface
2. Runtimer monitoring
1. AOP for Java-Aspectj
3. Log based debug
Excersse-2
Matrix Multiplication
◦
◦ A11 A12 A13 B11 B12 B13
◦ A= A21 A22 A23 B= B21 B22 B23
◦ A31 A32 A33 B31 B32 B33
◦
◦ C=A*B
◦ Mapper Input
▪ Key: line number Value: A’s line content
◦ Mapper Output
▪ Key: line number Value: C’s line content
◦ Reducer Output
▪ C matrix
MR algorithm design
• Manageing dependencies
o Mappers run in isolation
You have no idea in what order the mappers run
You have no idea on what node the mappers run
You have no idea when each mapper finishes
o Tools for synchronization
Ability to hold state in reducer across multiple key-
value pairs
Sorting function for keys
Partitioner
Cleverly-constructed data structures
• Write your own Map and Reduce functions
Advanced Topics
Variants of MapReduce (In heterogeneous Environments)
1) GPGPU MapReduce
CUDA + C
Hadoop + JCUDA
2) MapReduce on MultiCore
3) MapReduce In Open Environment
4) Pydoop=Hadoop written in Python
What Google will do in the future?
MapReduce target is large scale data processing. Why? For
small process, it will introduce very heavy overhead. Google’s next
Generation solution will be called Percolator. We will see the detail
in USENIX Symposium next month.
Advanced Topics
MR-extension: Scheduling
• HDFS Balancer
o Threshold for datanode
Focus on full datanodes and new added datanodes
Can not be very small---why?
o Utilization balance
Highly poorly
No more than 10G or threshold fraction of sender and receiver’s
capacity
o Exists condition
Cluster balanced
No block can be moved
No block has been moved for 5 consecutive iterations
IOException occurs which communicating namenode
Another copy is running
o Side-effect
Bandwidth occupation
Solution: limit the balancer’s use of bandwidth by:
dfs.balance.bandwidthPerSec
MR-extension: Scheduling
• Job Scheduling
o Capacity Scheduler-Yahoo!
o Fair Sharing Scheduler- Facebook
o Delay Scheduling and copy-compute splitting
MR-extension: Scheduling
• Capacity Scheduler
o 1.A number of Queues, each of them has
configurable number of resource, Resource could be
shared with other queues;
o 2.Within a queue, it is FCFS;
o 3.Configurable waiting time for preemption;
o 4.Support Priority and Memory intensive jobs(need
announcement)
• Fair Scheduler
o 1. Fair;
o 2.Pools, guaranteed capacity, resource could be
shared among pools.
o 3. preemptive, easy for administrator.
Delay Scheduling
• Inducement
o Data Locality
Node locality, rack locality
One month observation for job of difference sizes
funning in production at Facebook:
Jobs with 1 to 25 maps, 5% node locality and 59%
rack locality
Because small jobs are common for ad-hoc queries
and hourly reports in a data warehouse
Small jobs arrive and incline to be scheduled
next(according to fair-sharing, or other schedulers);
The probability of node locality and rack locality of
small jobs is low
Delay Scheduling
• Pseudocode
When a node request task, it will
skip the small job and look at
subsequent jobs. For avoiding
starvation. Two wait time T1 and
T2 has been added into this
scheduling method
T1: jobs wait before being allowed
to launch non-local tasks on same
rack
T2: after waiting for T1, extra
more time before being allowed to
launch off-rack
Copy-compute splitting
• Inducement
o Reduce Slot Hoarding Problem
Hadoop launch reduce after a few mappers finish, so that map and
reduce can interleaving execution
Large job with tens of thousands of map tasks, the Map stage will take
very long time, and the job will hold nearly any reduce slots it receives
during this until its maps finish.
Cause starvation of jobs submitted later.
Copy-compute splitting
• Solution
o Combiner
Reduce map result on map stage side in memory
o Copy-Compute Splitting
Split reduce tasks into copy tasks and compute tasks
Compute Phase Admission Comtrol : let the tasktracker limits
the number of reducers computing for these
resources(maxComputing), and limits the number of copy-
phase reducers each job(maxReducer), then let other jobs use
idle slots
Side-Effect
Still starvation if maxReducer are occupied by long jobs:
preemption timeout.
Memory for merging map output in each reducer needs to be
smaller, so that the copy phases do not interfere with
computing phases
MR-extension: Energy Efficient
• Material and Method
o Six months Hadoop MR logs data
o Product trace analysis
Inter-Job Arrival Time, Data Size, Data Ratios, Per-Job Input Sizes
o Statistical workload replay
Re-generate workload use MR program according to previous logs
MR job to preserve data ratios
Ignore compute (WHY!!!???)
Wordcount and PI
• Results
MR-extension: Energy Efficient
• Results
• Conclusion
o Queue jobs at the launch time is better than launch N at the same time;
o For batched execution, a staggered fashion is better than let all job in the queue;
o Larger block size than 64MB results faster finishing time, if a cluster is highly utilized;
o More task trackers per node induce to faster finishing time and lower energy consumption;
o Larger cluster incline to have better energy-efficiency based on Brand Electronics Model
21-1850/CI power meter
MR-extension: Secure MR
• Close environment vs. Open environment
o OSG accepts HDFS
o Hadoop has patch for Condor(Hadoop-428)
MR-extension: Cascading and
ChainMR
• Break the barrier
• Cascading and ChainMR
MR-extension: HOD
• Basic HOD
o Torque-PBS
o HDFS, TTs, JT could be user-private
• HOD on Prairiefire
o SGE
o HDFS is global, JT and TT are user-private
• In the future
MR-extension: Secure MR
• How to avoid malicious mappers